Emergence of odd elasticity in a microswimmer using deep reinforcement learning

We use the Deep Q-Network with reinforcement learning to investigate the emergence of odd elasticity in an elastic microswimmer model. For an elastic microswimmer, it is challenging to obtain the optimized dynamics due to the intricate elastohydrodynamic interactions. However, our machine-trained model adopts a novel transition strategy (the waiting behavior) to optimize the locomotion. For the trained microswimmers, we evaluate the performance of the cycles by the product of the loop area (called non-reciprocality) and the loop frequency, and show that the average swimming velocity is proportional to the performance. By calculating the force-displacement correlations, we obtain the effective odd elasticity of the microswimmer to characterize its non-reciprocal dynamics. This emergent odd elasticity is shown to be closely related to the loop frequency of the cyclic deformation. Our work demonstrates the utility of machine learning in achieving optimal dynamics for elastic microswimmers and introduces post-analysis methods to extract crucial physical quantities such as non-reciprocality and odd elasticity.


I. INTRODUCTION
Active systems composed of self-driven units play a crucial role in biological processes as they are able to convert microscopic energy into macroscopic work [1][2][3].To achieve sustainable work at the microscopic scale, active units must go through non-reciprocal cyclic motions [4,5].For example, cyclic state transitions of enzymatic molecules are driven by catalytic chemical reactions [6,7], which can be described by simple coarsegrained models [8][9][10].To evaluate the functionality of an enzyme, we previously defined a physical quantity called non-reciprocality that represents the area enclosed by a trajectory in the conformational space [11,12].According to Purcell's scallop theorem for microswimmers moving in a viscous fluid [13][14][15], the average swimming velocity is proportional to the non-reciprocality and the loop frequency of the cyclic body motion.It was also reported that the crawling speed of a cell on a substrate is determined by the non-reciprocality [16][17][18].
Recently, Scheibner et al. introduced the concept of odd elasticity which is useful for characterizing nonequilibrium active systems [19,20].Odd elasticity, arising from antisymmetric (odd) components of the elastic modulus tensor that violate the energy conservation law, can exist in active materials [21][22][23][24], biological systems [25], and active robots [26][27][28].We emphasize that the concept of odd elasticity is not limited to elastic materials but can be extended to various dynamical systems [29,30].An illustrative example is that a microswimmer with odd elasticity can exhibit directional locomotion in the presence of thermal agitation [31].In fact, the average velocity of an odd microswimmer is proportional to the odd elasticity.On the other hand, in the * komura@wiucas.ac.cn model of a stochastic enzyme, we have quantified the average work per cycle in terms of effective odd elasticity [12].Notably, odd elasticity serves as a useful measure for characterizing non-equilibrium micromachines such as proteins, enzymes, microswimmers, and robots, regardless of their specific functions.
Despite the importance of odd elasticity in active systems, its physical origin still needs to be better understood [32].One possibility is to use Onsager's variational principle [33] to derive dynamical equations for an active system with odd elasticity [34].The obtained nonreciprocal equations [35] manifest the physical origin of the odd elastic constant that is proportional to the nonequilibrium driving force [34].On the other hand, odd elasticity may not be innate to micromachines but can be an ability acquired after many experiences and training processes.In this work, considering an elastic threesphere microswimmer model [36][37][38], we utilize machine learning techniques to account for the emergence of odd elastic relation between its elastic components.With this approach, a microswimmer can automatically obtain the most efficient swimming strategy without prescribing any deformation dynamics.
In recent years, machine learning has been widely applied to active systems as a powerful tool to unravel the complexities of biological systems [40][41][42][43].Notably, the application of reinforcement learning techniques is capable and versatile in training various microswimmers.These methods have been used to navigate them through complex and dynamic environments with remarkable adaptability, such as path-planning in turbulent flows or noisy surroundings [44][45][46][47].Furthermore, machine learning has been applied to optimize the local motion and intricate navigation of microswimmers with complex structures or higher degrees of freedom [48][49][50].These approaches have been further extended to more difficult tasks, such as cooperative swimming and predation mod-els [51][52][53].
The main aim of this article is to reveal the emergence of odd elasticity in an elastic microswimmer model by using the Deep Q-Network with reinforcement learning.Traditional kinematic models, such as rigidly connected swimmers, select paths in the deformation space (gaitswitching) assuming that the applied forces can instantaneously adapt to any prescribed motion [48,54,55].However, the deformation of an elastic microswimmer cannot be prescribed [36].Determining these dynamics requires a nuanced understanding of the elastohydrodynamic process, where traditional methods for achieving optimal control face significant challenges.These challenges are evident in the previous study [36], in which the swimming velocity of an elastic microswimmer decreases in its large-frequency regime when prescribed dynamics are assumed.
Unlike the prescribed dynamics model, our machinelearning approach successfully develops an optimal control strategy, adapting a transition (emergence of the waiting behavior) to avoid the velocity decrease.We note that such elastohydrodynamic systems, different from the study in Tsang et al. [48], usually require continuous state and action spaces to tackle.The newly discovered strategy transition using the waiting behavior emerges from the fluid-structure interactions in which distinct hydrodynamic modes with different time scales play important roles.
From the obtained numerical data, we quantify the performance and effective odd elasticity of the trained microswimmer.The estimated cycle performance, which is the product of the non-reciprocality (loop area) and the loop frequency, coincides with the swimming velocity by using a proper scale factor.We also demonstrate that the emergent odd elasticity of the microswimmer is closely related to the loop frequency of the cyclic deformation.The present work demonstrates the utility of machine learning in revealing various non-reciprocal phenomena in active systems.
In Sec.II, we review the model of an elastic microswimmer with prescribed dynamics [36].In Sec.III, we explain the deep reinforcement learning technique to train the elastic microswimmer.In Sec.IV, we describe the physical properties of the fully trained microswimmer.In particular, we shall discuss the emergence of limit cycles, cycle performance, average velocity, and effective odd elasticity.In Sec.V, we briefly mention the training progression of an elastic microswimmer.A summary of our work and some discussion are given in Sec.VI.

II. ELASTIC MICROSWIMMER WITH PRESCRIBED DYNAMICS
We first review the elastic three-sphere microswimmer model introduced by the present authors [36,[56][57][58] and others [37][38][39].As illustrated in Fig. 1(a), the microswimmer consists of three spheres of radius a positioned along < l a t e x i t s h a 1 _ b a s e 6 4 = " F 8 t 8 0 / g 5 6 9 n + 6 b 0 5 1 n w C y F h t p 8 q e F x N C N 9 C W P g E n I q J y w = = < / l a t e x i t > K < l a t e x i t s h a 1 _ b a s e 6 4 = " l w 5  a one-dimensional coordinate system, denoted by x i (i = 1, 2, 3).One can assume x 1 < x 2 < x 3 without loss of generality.Unlike the original three-sphere microswimmer model by Najafi and Golestanian [54,55], the three spheres are connected by two harmonic springs, each with a time-dependent natural length ℓ α (α = A, B) and having the same spring elastic constant K.Such an elastic microswimmer is immersed in a viscous fluid with the shear viscosity η.Although a and K can differ between the spheres and the springs, respectively [36,55,59], we consider here the symmetric case.This elastic microswimmer model [36] reduces to the original threesphere model with rigid arms [54,55] when K is infinitely large.
When the spring lengths L A = x 2 − x 1 and L B = x 3 − x 2 deviate from their natural lengths ℓ α , the elastic forces f i acting on each sphere are given by  12) that describes the frequency dependence of the average swimming velocity V of an elastic swimmer with prescribed dynamics of the natural lengths given by Eqs. ( 9) and (10).
Due to hydrodynamic interactions described by the Stokes mobility and the Oseen tensor, the forces f i and the sphere velocities v i = ẋi = dx i /dt (dot indicates the time derivative) are related by [54,55] where the conditions a/L α ≪ 1 are assumed.For the elastic microswimmer, the force-free condition, f 1 + f 2 + f 3 = 0, is automatically satisfied, ensuring a selfpropelled motion without any external force.When L α and ℓ α are given, the dynamics of the microswimmer are deterministic and the total swimming velocity V = (v 1 + v 2 + v 3 )/3 is given by For relatively small deformations of the springs, we can define the small displacements with respect to the average spring length ℓ as u α = L α − ℓ (α = A, B).Within the small-amplitude approximation, u α /ℓ ≪ 1, Golestanian and Ajdari calculated the average swimming velocity of a three-sphere microswimmer up to the leading order as [55] Here the averaging, indicated by the bar, is performed by time integration in a full cycle and further divided by the total time of a period.The above expression indicates that V is determined by the product of the closed loop area and the loop frequency [60].
The explicit form of V of an elastic microswimmer can be obtained by specifying a prescribed cyclic change in the natural spring lengths ℓ α .Previously, we used the following sinusoidal forms [36,58] where ℓ is the constant natural length, d α are the amplitudes of the oscillatory change, Ω is the common frequency, and ϕ is the phase difference between the two cyclic changes.When the natural lengths undergo this prescribed cycle, the spring lengths relax to their new natural lengths obeying Eqs. ( 1)-( 6) with a hydrodynamic relaxation time τ = 6πηa/K.Then the average swimming velocity of an elastic microswimmer with the prescribed dynamics was calculated to be [36,58] where Ω = Ωτ is the dimensionless frequency and G( Ω) is the scaling function We see that V is non-zero when ϕ ̸ = 0, corresponding to the non-reciprocal deformation, and |V | is maximized when ϕ = ±π/2.Notably, such non-reciprocal deformation can be effectively generated by assuming an antisymmetric stress-strain relation among the two springs [30,31].This antisymmetric cross-correlation represents what we refer to as odd elasticity in this article.In Fig. 2, we plot G( Ω) using Eq. ( 12).In the smallfrequency limit, where Ω ≪ 1, the average velocity increases as V ∼ Ω [54,55].However, in the largerfrequency limit, where Ω ≫ 1, the average velocity decreases as V ∼ Ω−1 [36,58].The crossover frequency between these two regimes is approximately Ω ≈ 1.In the large-frequency regime, the mechanical response is delayed because it takes time for the springs to relax to their natural lengths.Such a decrease in the swimming velocity is a drawback of elastic microswimmers with prescribed motion.A similar crossover behavior and a decrease in the average velocity were also predicted for the Najafi-Golestanian microswimmer model in a viscoelastic fluid [61,62].

III. ELASTIC MICROSWIMMER DIRECTED BY REINFORCEMENT LEARNING
Using an elastic microswimmer model, we apply a machine learning method to direct its movement rather than prescribing the motion of the natural lengths ℓ α .We combine the Deep Q-Network (DQN) with reinforcement learning [63][64][65] to train the actuation of a microswimmer and obtain the optimized dynamics for the natural lengths.In particular, we shall investigate how a trained microswimmer adapts to a new strategy to avoid the decrease in the average swimming velocity when the actuation is faster than the hydrodynamic relaxation.
As shown in Fig. 1(b), our artificial intelligence (AI) uses the spring lengths L α and the natural lengths ℓ α as an observation (state) input and performs an action to change either ℓ A or ℓ B with an actuation velocity U , i.e., the rate of changes in the natural lengths.Specifically, the output action space is discrete and consists of four actions ℓ ± α corresponding to the increase and decrease of ℓ α .When the natural lengths ℓ α change, the spring lengths L α tend to relax toward the new natural lengths according to Eqs. ( 1)- (6).Hereinafter, we choose the sphere radius a and the hydrodynamic relaxation time τ = 6πηa/K as the units for length and time, respectively.The dimensionless quantities are then denoted with a hat such as Lα = L α /a, lα = ℓ α /a, and Û = U τ /a.
During the training process, we constrain the natural lengths in the range 8 ≤ lα ≤ 12 to ensure the conditions a/L α ≪ 1 and (L α −ℓ)/ℓ ≪ 1.This assignment simplifies the model and aligns with established methods, enabling valid comparisons with existing studies [31,54,55,58].In our model, the actuation velocity Û is a control parameter and can be contrasted with the frequency Ω in Sec.II.We train the microswimmers at different actuation velocities ranging from Û = 0.1 to 10, with an increment of 0.1.Each training session comprises 200 episodes, and each episode contains 1,200 decisions made by the AI.
To optimize the locomotion of the microswimmer, we initiate each episode with a random state and employ the Epsilon Greedy Algorithm (EGA) to further balance between exploration and exploitation during training process [65].Actions are determined successively by the AI in each decision step in which the natural lengths lα are changed by ±1 with a given actuation velocity Û .Hence, consecutive decisions are made at intervals of ∆ tdec = 1/ Û .Since the numerical time step ∆ tnum = 0.01 is used to solve the hydrodynamic equations in Eqs. ( 1)-( 6), the relation ∆ tdec /∆ tnum = 100/ Û ensures adequate numerical time steps between successive actions for 0.1 ≤ Û ≤ 10.
In our model, successive actions are consistently taken at intervals of ∆ tdec .This means that the subsequent action is executed immediately after lα has evolved by ±1, irrespective of whether the spring lengths Lα have fully relaxed to the new natural lengths ℓ α or not.In specific situations, the machine can conduct a waiting strategy where it refrains from changing any natural lengths for ∆ tdec .Such a decision arises from actions that violate the natural length constraints.For example, taking l+ A is not allowed when lA = 12.In this case, no change is applied to lA , and the consecutive action is conducted after ∆ tdec .
The training process is formulated as a Markov decision process (MDP) with the memoryless property [65], ensuring that the immediate reward depends only on the current state and action.Within the MDP method, the training of the DQN is guided by Bellman's equation that is used to iteratively update the prediction of the Q-value function (the expected cumulative future reward) in every decision step [63][64][65].In our model, Bellman's equation is given by where Q(s t , a t ) represents the Q-value of taking an action a t ∈ l± α in state s t = ( Lα , lα ) at time t, and r t denotes the reward obtained after taking action a t in state s t .The reward is defined as the positive displacement of the whole microswimmer during a decision step.The coefficient γ represents the discount factor that balances the importance between immediate rewards and future rewards.We choose a conventional value of γ = 0.99 for farsightedness to prioritize long-term cumulative rewards [48][49][50]53].By updating the network with Bellman's equation in every decision step, our DQN efficiently learns the optimal policy for controlling the dynamics of the natural lengths lα .
In Fig. 3(a), we show the training curve of a microswimmer when the actuation velocity Û = 2.As a function of the trained episodes, we plot the total distance, namely, the net displacement from the initial position that the microswimmer can achieve within each episode.During the training process, the total distance continuously increases over the episodes, indicating enhanced locomotion ability.After about 100 episodes of training, the microswimmer's performance approaches an optimal swimming distance within each single episode.The small fluctuations that remain after reaching the optimal total distance are due to random initial conditions and EGA used for the training.Details of the training progression will be discussed separately in Sec.V.
The characteristics of the training curve vary for swimmers with different values of Û .As Û increases, swimmers require more training episodes to achieve optimized swimming velocity.This trend becomes pronounced at higher values of Û , which are not included in this article due to the lack of additional physical interest (e.g., Û ∼ 20-30).On the other hand, the standard deviation of the training datasets shows increased sensitivity within the range Û ∈ [0.1,10].This increase in standard deviation is due to the complexity arising from the delayed response of the exact length L α .Such a delay in response leads to a behavior transition, which will be further discussed in Sec.IV.

IV. FULLY TRAINED ELASTIC MICROSWIMMER
In this section, we discuss the dynamic properties of the microswimmers which have been trained for 200 episodes.Besides the emergence of limit cycles, we shall discuss the performance and effective odd elasticity of trained microswimmers.

A. Emergence of limit cycles
In Figs.3(b) and (c), we present the cyclic motions of the fully trained swimmer when Û = 2.The black lines represent the deviation of the natural lengths from its average value, i.e., dα = lα − l (α = A, B), where we chose l = 10 because of the constraint 8 ≤ lα ≤ 12.The red lines represent the spring extensions ûα = Lα − l (α = A, B).Both ûA and ûB exhibit periodic motions with a phase difference of approximately π/2, which corresponds to the maximum efficiency for swimming.
In Fig. 3(d), we present the configuration space trajectory of the same trained microswimmer over one cycle.The trajectory of the natural lengths dα (shown in black) forms a counterclockwise square in the range −2 ≤ dα ≤ 2. The spring extensions ûα (shown in red) also exhibit a limit cycle in the configuration space.This indicates that the fully trained elastic microswimmer has acquired steady non-reciprocal spring motion after training.
To investigate the dependence on the actuation velocity Û , we plot the configuration space trajectories for different Û -values ranging from 0.1 to 10 in Figs.4(a)-(h) (Fig. 4(d) and Fig. 3(d) are the same).In all the cases, the natural lengths dα (shown in black) follow the same counterclockwise square shape.When the actuation velocity is small, such as Û ≤ 2 in Figs.4(a)-(d), the hydrodynamic relaxation process can catch up with the change in the natural lengths, and the cycles of the spring extensions ûα (shown in red) are close to the square-shaped trajectory.In this regime, the enclosed area within the loop decreases when Û is increased, as we quantify in the next subsection.
For larger actuation velocity Û > 2, corresponding to Figs. 4(e)-(h), the hydrodynamic relaxation becomes the slower mode.This situation is similar to the case of Ω > 1 in Sec.II.For the trained microswimmer, however, the AI adapts a waiting strategy once the natural lengths reach the maximum or minimum values ( dα = ±2) for the spring extensions ûα to relax sufficiently.This waiting strategy allows the spring lengths to catch up with the large actuation velocity Û and to prevent the enclosed area from further decreasing.Since the dynamics are no longer dominated by the actuation velocity Û , the trajectories of ûα become less squared and less symmetric for Û > 2. As a result, the distinctions between different loop shapes become less pronounced in Figs.4(e)-(h).
Another notable result in Fig. 4 is the fore-aft amplitude asymmetry between ûA and ûB .In

B. Cycle performance and swimming velocity
Next, we discuss the performance of the acquired cyclic motion and the average swimming velocity of the fully trained microswimmers.Following our work on catalytic enzymes [11,12], we consider the following quan-tity called non-reciprocality where T is the period of one cycle.We note again that R represents the area enclosed by the loop trajectory in the configuration space [60].Then the dimensionless average velocity V = V τ /a obtained from Eq. ( 8) can be rewritten in terms of the dimensionless non-reciprocality R = R/a 2 as V = 7 where T = T /τ .We shall call 1/ T the dimensionless loop frequency.The quantity R/ T , representing the area enclosed by the loop per unit time, thus determines the swimming velocity when the deformation is small.In Fig. 6(a), we plot the non-reciprocality R as a function of Û for fully trained microswimmers.As indicated in Fig. 4, the enclosed area R decreases as Û increases.In deviates from the linear relation.This is because the actuation velocity Û outpaces the hydrodynamic relaxation rate, and the AI adapts the waiting strategy to adjust to the slow hydrodynamic mode.The slope of the dashed line is 1/16 (1/ T = Û /16), and hence the dimensionless waiting time at dα = ±2 is estimated by T − 16/ Û .The finite waiting time appears as a transition at Û ≈ 2. As plotted in Fig. 6(c), we find that the cycle performance, as measured by R/ T , increases monotonically with Û and eventually approaches the value R/ T ≈ 1.75, bounded by the hydrodynamic relaxation process.
To check the validity of Eq. ( 15), we plot in Fig. 6(d 15) when l = 10.A small difference between these pre-factors comes from the assumption u α /ℓ ≪ 1 used in Eq. ( 8) or Eq. ( 15).As previously shown in Fig. 5, the spring deformations for our trained microswimmers can reach ûα / l ≈ 0.2 or larger.Higher-order contributions need to be incorporated into Eq.( 15) to reproduce the average swimming velocity achieved by our swimmer model.
The behavior of V in Fig. 6(d) for the trained microswimmer is in sharp contrast to that of an elastic microswimmer whose natural spring motions are prescribed.In Sec.II, we showed in Eq. (11) and Fig. 2 that the average velocity decreases when Ω > 1 [36,58].For the fully trained microswimmer, however, an emergent waiting strategy appears when Û > 2, such that the swim-ming velocity does not decrease even at higher actuation velocities Û .This discovery of a strategy transition exemplifies how machine learning can reveal the intricate optimal dynamics of complex microswimmers.

C. Effective even and odd elasticities
Finally, we explain the method to extract the effective even and odd elasticities of the trained microswimmer from the numerical data.A direct way is to assume the following odd-elastic Hookean relations between the forces and the spring extensions [12,30,35]: In the above expression, the forces F α are given by Stokes' law F α = −6πηa uα and u α = L α − ℓ as before.
In the elastic matrix, k e and k ′ represent the diagonal and off-diagonal even elasticities, respectively, while k o represents the effective odd elasticity.The diagonal even elasticity k e should be distinguished from the spring elastic constant K in the elastic microswimmer model.Typically, a finite k e results in damping oscillation.Given the fact that our data demonstrate sustained oscillations without amplitude decay, we assume in the following that k e = 0. On the other hand, the off-diagonal even elasticity k ′ characterizes the fore-aft amplitude asymmetry as described in Fig. 5, and the odd elasticity k o should quantify the non-reciprocal dynamics of the microswimmer [12,30].
To obtain k ′ and k o from the numerical data, we calculate the cross-correlations F A u B and F B u A averaged over one period of the deformation cycle.This is possible since the trained dynamics for the two springs are periodic with the same frequency and a finite phase difference.Since these correlations are given by the effective (off-diagonal) even and odd elastic coefficients can be obtained from where all the related correlations can be directly calculated from the numerical data.In Figs.7(a) and (b), we plot dimensionless even elasticity, k′ = k ′ /K and odd elasticity, ko = k o /K, respectively, as functions of Û .Both k′ and ko vanish when Û = 0. We recognize that the behavior of k′ is similar to that of R in Fig. 6(a) (except a constant shift), while the data of ko closely resembles that of 1/ T in Fig. 6(b).These results indicate that k′ characterizes the extent of amplitude asymmetry that reduces the enclosed area, whereas ko directly corresponds to the loop frequency.Since 1/ T ∝ ko , we confirm that the average swimming velocity V in Eq. ( 15) is proportional to the odd elasticity k o , i.e., V ∝ k o .Notably, the odd elasticity ko is proportional to the actuation velocity up to Û ≤ 2, similar to 1/ T .
The fact that the effective odd elasticity ko is proportional to the loop frequency 1/ T is consistent with our model of a stochastic odd microswimmer in which the presence of odd elasticity was implemented [31,59].In the odd microswimmer model, the probability flux forms a closed loop and the eigenvalues of the corresponding frequency matrix are proportional to the odd elasticity.For the time-correlation functions in general odd Langevin systems [30], it was shown that odd elasticity determines the frequency of the sinusoidal component both in their symmetric and anti-symmetric parts.These results support the relationship wherein the work per cycle is generally determined by the product of the odd elasticity and the closed loop area in active systems [19,20].

V. TRAINING PROGRESSION OF ELASTIC MICROSWIMMER
In this section, we discuss briefly the dynamic evolution of swimming behavior during the training process.The plots in Figs.8(a)-(d) illustrate the configuration space trajectories ûα (α = A, B) at distinct stages of training in Fig. 3(a), namely, 5, 10, 20, and 40 episodes, respectively, when Û = 2.Each trajectory starts from a different initial state denoted by colored dots.
In the initial stage of training, as shown in Fig. 8(a), the swimming behavior is relatively restricted.Starting from different initial states, the system only evolves towards the point (2, −2), showing limited exploration of the motion possibilities.This early stage primarily focuses on optimizing short-term rewards where cyclic motion has not emerged yet.As the microswimmer gains more training experience, its behavior progressively improves.After approximately 10 episodes, as shown in Fig. 8(b), the microswimmer begins to explore longterm rewards accessible through cyclic motions.Enclosed loops emerge in the configuration space, and the microswimmer starts to recognize the dynamics required for sustainable cyclic locomotion.
In Fig. 8(c) at around 20 training episodes, the shape of loops becomes clearer and more refined.This stage shows the swimmer's ability to fine-tune its motion strategy toward the optimal cyclic pattern to maximize longterm rewards.After training for 40 episodes, the microswimmer achieves the optimized limit cycle, as shown in Fig. 8(d).Importantly, this optimized cycle is robust across various initial conditions, and it will eventually approach the limit cycle shown in Fig. 4(d).

VI. SUMMARY AND DISCUSSION
Using deep reinforcement learning (Deep Q-Network), we have investigated how the effective odd elasticity emerges when optimizing the swimming ability of an elastic microswimmer [36][37][38].One of the key findings is the optimized natural-length dynamics without the need for prescribed motion.Notably, we observed a strategy transition (the emergence of waiting behavior) when the actuation velocity Û ≈ 2 (Fig. 4).For larger Û , the trained microswimmers adapt to the slow hydrodynamic relaxation and avoid the velocity decrease.This waiting strategy significantly improves the swimming ability compared to the elastic microswimmer with prescribed dynamics having large-frequency oscillations.Additionally, the trained microswimmer exhibits fore-aft asymmetry in the spring amplitudes (Fig. 5), which is generally difficult to predict and implement in the prescribed dynamics.
By calculating the force-displacement correlations for fully trained microswimmers, we have extracted the effective even and odd elasticities, k ′ and k o , respectively (Fig. 7).We have shown that the U -dependencies of k ′ and k o closely resemble those of the non-reciprocality R in Fig. 6(a) and the loop frequency 1/T in Fig. 6(b), respectively.From the numerical data, we have further confirmed that the average swimming velocity V is proportional to the cycle performance R/T (Figs. 6(c) and (d)), as predicted in Eq. (15).These results clarify the proportionality between the average velocity and odd elasticity, V ∝ k o .Our study demonstrates the use of machine learning to reveal the emergence of odd elasticity in various active systems.
If we assume that the energy injected through the nonreciprocal process balances with the dissipation due to the net motion of a microswimmer, its power (work per unit time) scales as Ẇ ∼ (ηaV ) × V ∝ (k o ) 2 .Here, ηaV corresponds to the dissipative force, and we have used the relation V ∝ k o [12].For an odd microswimmer, it was further shown that all the extracted work due to odd elasticity is converted into the entropy production rate [31].Hence, odd elasticity is useful to characterize non-reciprocal dynamics of active micromachines, including not only microswimmers but also other molecular motors [29,30].With our analysis method, the work performance of these micromachines can be quantified by observing their deformation, even without any precise understanding of their specific functions.
In our machine learning model, we adopted the DQN algorithm with a discrete action space to train elastic microswimmers.Since the swimming ability is directly related to the effective strength of non-reciprocal forces, the optimized dynamics are to adjust one of the natural lengths to either its maximum or minimum value before changing the other.For such strategies, the microswimmer does not change the two natural lengths simultaneously, and thus continuous-action algorithms are unnecessary.To validate this argument, we have also employed other continuous-action algorithms, such as Deep Deterministic Policy Gradient (DDPG) [66] and Soft Actor-Critic (SAC) [67].Both of these methods resulted in the same strategy as the discrete DQN algorithm.Although these continuous-action algorithms are helpful for more complicated microswimmers, the discrete DQN algorithm used in this work is sufficient for an elastic microswimmer moving in a one-dimensional space.
Microswimmers composed of biomaterials are often soft and exhibit viscoelastic responses.This softness is crucial not only because of the inevitable interaction with viscoelastic environments but also due to their versatile functionalities [68][69][70][71][72][73][74], such as mechanical signal sensing [75,76], cargo loading and unloading [77,78], and navigation through intricate channels [79].On the other hand, when one considers manipulating the gaitswitching of a microswimmer from a simple potential field, our elastic model becomes more suitable.For example, if we consider manipulating the spheres through optical tweezers or a harmonic electromagnetic field [39,72], the corresponding potential could be determined effectively through Eqs. ( 1)-(3).
We comment that the even and odd elasticities obtained from the numerical data are assumed to be linear.In a more general scenario, however, these elasticities can be non-linear [20].Since a diagonal positive elasticity generally leads to a decaying oscillation [34], non-linearity is commonly required for odd-elastic systems to exhibit a stable limit cycle [26].The inclusion of non-linear elasticity can describe more general cases, such as the spontaneous onset of oscillations with a specific amplitude that is regulated by the ratio of linear and non-linear even elasticities [27].In the present work, the oscillation amplitudes are determined by the constraint that clips the natural lengths, and hence the linear approach is more suitable.
In our study, a crossover between the elastic-and hydrodynamic-dominated limits generally exists between microswimmers' large-and small-frequency limits.For such systems, we have shown that a swimming transition strategy around the crossover actuation velocity is desired for the optimized performance because a simple prescribed periodic motion becomes less efficient in the large-frequency regime.Crucially, we emphasize a fundamental characteristic that exists in such systems: a significant delay between the applied control (ℓ α ) and the shape responses (L α ).This delay induces strategy transitions that can generally exist in various microswimmers.
Applications that use machine learning to optimize gait-switching for microswimmer navigation have been widely studied in recent literatures [40][41][42][43][44][45][46][47][48][49][50][51][52][53].These studies have provided valuable insights into the potential of machine learning to optimize the dynamics of microswimmers.In the current work, in addition to focusing on performance optimization, we have emphasized the possibility of extracting useful information such as nonreciprocality and odd elasticity.In future studies, we aim to establish universal relations between performance and odd elasticity for different types of microswimmers [15].
K b 4 5 y X p x 3 5 2 P e W n D y m U P 4 A + f z B / s i j + M = < / l a t e x i t > (a) < l a t e x i t s h a 1 _ b a s e 6 4 = " b l b A

FIG. 1 .
FIG. 1.(a) An elastic microswimmer consists of three spheres of radius a positioned along a one-dimensional coordinate system, denoted by xi (i = 1, 2, 3).The three spheres are connected by two harmonic springs with the elastic constant K and the time-dependent natural lengths ℓα (α = A, B).The spring extensions are denoted by LA = x2 − x1 and LB = x3 − x2.The microswimmer is immersed in a viscous fluid with the shear viscosity η and τ = 6πηa/K gives the hydrodynamic relaxation time.(b) Schematic of the neural network architecture that predicts Q-values for each action from an input observation of Lα and ℓα.The blue and red columns represent the input and output layers, respectively.The grey columns are the three linear hidden layers with dimensions 256, 128, and 128.

8 GFIG. 2 .
FIG.2.The scaling function G( Ω) in Eq. (12) that describes the frequency dependence of the average swimming velocity V of an elastic swimmer with prescribed dynamics of the natural lengths given by Eqs.(9) and(10).

FIG. 3 .
FIG. 3. (a) Training curve of an elastic microswimmer with an actuation velocity of Û = 2.The black line represents the total distance achieved by the microswimmer in each training episode for a single case.The blue shaded area indicates the standard deviation around the mean value obtained from 100 training instances of microswimmers with the same actuation velocity.(b), (c) Periodic dynamics of the natural length deviations dα = lα − l (black) and the the spring extensions ûα = Lα − l (red) obtained from the fully trained microswimmer with 200 episodes of experience.The dimensionless time is defined by t = t/τ , and the parameters are Û = 2 and l = 10.The oscillation phase difference between (b) and (c) is approximately π/2, with a fore-aft amplitude difference.(d) The configuration space trajectory of dα (black) and ûα (red) when Û = 2.Both dα and ûα form counterclockwise cyclic loops.

Fig. 5 ,
we plot the amplitudes |û α | as a function of Û .Starting from the value |û α | = 2, |û A | and |û B | decreases and increases, respectively, as Û is increased.Notice that the maximum amplitude can be up to |û B |/ l ≈ 0.25.When Û > 2, however, |û α | are almost independent of Û .This asymmetry in fore-aft amplitude is due to the non-reciprocal swimming cycle, which is associated with the asymmetric order in which ℓ α changes.

FIG. 4 .
FIG. 4. Limit cycles of microswimmers trained with different Û -values ranging from 0.1 to 10. Black and red lines represent the natural length deviations dα and the spring extensions ûα, respectively.The plot in (d) for Û = 2 is identical to that in Fig. 3(d).

Fig. 6 (FIG. 6 .
FIG. 6.Performance of the fully trained microswimmer at various actuation velocities Û .(a) The dimensionless nonreciprocality R = R/a 2 defined in Eq. (14) as a function of Û .This quantity represents the enclosed area of the red loops made of uα in Fig. 4. (b) The dimensionless loop frequency 1/ T = τ /T as a function of Û .When Û ≤ 2, the loop frequency increases linearly with Û as fitted by the dotted line (1/ T = Û /16).(c) The dimensionless cycle performance R/ T (calculated from (a) and (b)) as a function of Û .(d) The dimensionless average swimming velocity V = V τ /a as a function of Û .Both R/ T in (c) and V in (d) show the same Û -dependence except for a scaling factor.The ratio between these quantities is V T / R ≈ 6.57 × 10 −3 .The interpretation of this ratio is discussed in the text.
) the average swimming velocity V obtained from the actual displacement of the microswimmers.Both R/ T and V in Figs.6(c) and (d), respectively, show almost the same dependence on Û except for a scaling factor.The obtained ratio from Figs. 6(c) and (d) is V T / R ≈ 6.57 × 10 −3 within the studied Û -range.This value can be compared with the geometrical pre-factor 7/(12 l2 ) ≈ 5.83 × 10 −3 in Eq. ( FIG. 7.(a) The effective off-diagonal even elasticity k′ = k ′ /K for fully trained microswimmers as a function of Û [see Eq. (19)].The Û -dependence is similar to that of the non-reciprocality R in Fig.6(a).(b) The effective odd elasticity ko = k o /K as a function of Û [see Eq. (20)].The Ûdependence closely resembles that of the loop frequency 1/ T in Fig. 6(b).

FIG. 8 .
FIG. 8.The configuration space trajectories at different training stages; (a) 5, (b)10, (c) 20, and (d) 40 episodes for a microswimmer with Û = 2.Each trajectory is initiated from a distinct initial state, indicated by different colored dots.Training beyond the stage (d) aims to approach the optimized limit cycle shown in Fig. 4(d) for 200 episodes.