Central pattern generators evolved for real-time adaptation to rhythmic stimuli

Alex Szorkovszky; Frank Veenstra; Kyrre Glette

doi:10.1088/1748-3190/ace017

1. Introduction

Biologically inspired central pattern generators (CPGs) are useful for their properties, typical of self-organized systems, such as distributed control and robustness to perturbations [1, 2]. This allows adaptive behaviours such as compensation for physical damage [3] or walking in novel environments [4]. Spontaneous entrainment of motion patterns to sensory input is also expected from such systems, and adaptation of bio-inspired CPGs to body and environmental mechanics has indeed been widely demonstrated [5–8]. Likewise, the movement pattern itself can be determined by interactions with the environment [9, 10].

While there typically exists a 'natural' gait frequency for a particular gait with a particular body (i.e. one that maximizes energy efficiency), it is often necessary to modulate one's gait frequency. A cat, for example, may need to walk at a slow pace in order to ambush its prey. For robots, being able to adapt to humans in their vicinity is an important goal, particularly for caring and collaborative applications [11]. For early humans, it is widely thought that the ability to synchronize movements with others was a crucial step in the evolution of social cognition [12–14]. This is therefore a relevant capability for socially responsive and intelligent robotics.

In vertebrates, gait pattern and frequency modulation are indirectly controlled by the intensity of the current from the brain stem, to which the spiking rates of locomotor neuron populations are sensitive [15]. Early bio-inspired robotics work attempted to replicate this emergent pattern generation, typically with continuous-time recurrent neural networks [16, 17]. This connectionist approach was largely abandoned in favour of more manageable coupled limit-cycle oscillators, where each parameter's effect on the overall behaviour is predictable [1, 18, 19], particularly if the CPG outputs are mapped to workspace trajectories [20]. Hence, the problem of gait adaptation has become a matter of designing appropriate feedbacks or learning schemes for the designated control parameters. These continuous learning approaches have been highly successful for learning stable locomotion and adapting to unseen physical environments [21, 22].

Adapting to a social environment, however, is qualitatively different to adapting to a physical environment. One example of a social adaptation is imitation or learning by demonstration. This is now common in, for example, compliant robotic arms, where there are few constraints on movement [23]. However, for most CPG-based legged robot controllers, the range of available limit cycles is prescribed by design. Therefore, the potential for imitation and synchronization is relatively limited in most current legged robots.

The problem of large parameter spaces that comes with more flexible neuron models can be tackled using the same method by which nature succeeded to make animals roam the earth. Evolutionary methods are a useful tool for robotics, making use of the abundance of computing power now at our disposal [24, 25] and new algorithms for promoting diversity of designs [26, 27]. This has led to advances in morphology design [28], modular and soft robots [29, 30], and generative encodings [31].

While evolution and real-time adaptation may at first glance seem like unrelated processes working at very different time-scales, there are several ways in which evolution can facilitate adaptation. Evolution can, for example, globally optimize in the high-dimensional space of connection weights in recurrent neural networks so that a lower dimensional space of inputs encompasses a wide repertoire of output patterns [32, 33]. When applied to locomotion patterns, this dimensionality reduction can therefore simplify the task of online learning of which behaviours are most suitable in which state and environment, as is done in reinforcement learning [34], or the task of Hebbian learning of connection weights for higher-level control [35, 36]. Importantly, reactive controllers can also be optimized in advance for susceptibility to a wide range of inputs, allowing spontaneous compliant motion in an open-loop scenario.

Single-objective evolutionary algorithms were used in early work on connectionist CPGs, generally to optimize some combination of walking speed, regularity and stability measures [17, 37]. These measures are often involved in trade-offs, meaning that despite the 'hands-off' nature of evolutionary algorithms, fitness functions still needed to be carefully designed to weight each measure appropriately. The advent of multi-objective algorithms [26] allows these measures to be separated into their own fitness functions, so that a diverse range of controllers is generated along the Pareto front of non-dominated solutions [38–40]. Therefore, in addition to the advantages of evolutionary algorithms for highly flexible neural controllers, multi-objective evolution in particular also allows for correlational studies about their emergent properties, and for controllers to be hand-picked for different sets of capabilities after a single optimization process.

In this paper, we use multi-objective evolution to test the assumption that evolving CPGs for flexibility can facilitate rapid real-time gait adaptation. This contributes to bio-inspired robotics in two ways. Firstly, we demonstrate novel virtual robot quadrupeds that can entrain their locomotion to a range of rhythmic external inputs without physical coupling and without explicit feedback. This kind of automatic adaptation to social environments via audio and visual perception is common in humans, such as the tendency to synchronize when walking together [41]. While rhythmic entrainment to social partners has been demonstrated in virtual robots [42], this used relatively slow phase-based feedbacks in linear oscillators. Therefore, using neuromorphic CPGs can increase the naturalistic quality of multi-robot and human-robot interaction at the level of basic behaviours.

Secondly, we show how examining correlations between emergent properties of the CPGs can aid future design. Connectionist control systems are once again being increasingly employed in robotics, and these generally require searching through a large parameter space using automated processes such as reinforcement learning or genetic algorithms that target a desired ability [43, 44]. Identifying statistical trends between such abilities (costs or fitness functions) and features that can be more easily manipulated prior to optimization can greatly increase the efficiency of this time-consuming process [45].

For example, previous work with disembodied CPG architectures has suggested that both gait type and the sensitivity of oscillation period to neuron bias are important factors for entrainment ability [46]. We further test this correlation in an embodied context, where there is a simultaneous goal of upright walking. To test whether our results are dependent on the morphology of the embodied system, we use two quadrupeds that differ in limb length. Wide applicability is important for interaction between robots optimized for different environments, or those that adapt their morphology in real time [47].

First, using a multi-objective genetic algorithm and a fitness evaluation during which two control parameters are swept, we evolve populations of CPGs for flexible walking speed and direction. Then, for a subset of CPGs that emphasize different components of the objective function, we incrementally evolve robust filters for rhythmic (such as audio) input. We analyse the real-time entrainment performance of the robots as a function of the CPG properties, in particular the flexibility of the gait period and pattern. Finally, we discuss the implications of our results for understanding adaptive and imitative behaviour, as well as the potentials of our approach for multi-robot systems, human-robot interaction and autonomous learning.

2. Methods

2.1. Neuron model

The neural model is based on the Matsuoka neuron [48], a simple and popular model for robotics [8, 20, 49–51]. This is a biologically motivated yet abstract two-variable model:

$\begin{align} t_0 \frac{du_i}{dt} & = -u_i -av_i + I_i(t) \end{align} \tag{ 1 }$

$\begin{align} t_0 \frac{dv_i}{dt} & = -\gamma v_i + bh(u_i)\;\; \end{align} \tag{ 2 }$

where h(u) is a rectified linear unit: $h(u) = 0$ for $u\leqslant0$ and $h(u) = u$ for u > 0. Like many biological models, there is a fast 'spiking' variable (u_i ) and a slow 'recovery' variable (v_i ).

While this model produces patterns of spiking in a simple way, its linearity leads to a poor ability to adapt oscillation frequency [52]. Importantly, unlike biological neurons, the spiking rate is insensitive to changes in tonic input [53]. To address this shortcoming, we add a sigmoidal 'deactivation' function to the fast variable u_i

$\begin{equation} t_0 \frac{du_i}{dt} \!=\! -u_i \!-\!aS(\kappa[u_i-u_0]) v_i \!+\! c_i \!+\! d_i I_\mathrm{DC} \!+\! I_{\mathrm{AC},i}(t) \end{equation} \tag{ 3 }$

$\begin{align} t_0 \frac{dv_i}{dt} & = -\gamma v_i + bh(u_i) \; , \end{align} \tag{ 4 }$

where $S(x) = 1/(1+\exp(x))$ . In addition we introduce $I_\mathrm{DC}$ , a control parameter modelling the global brain stem input separately from fluctuating inputs $I_{\mathrm{AC}i}$ and a constant offset c_i . For a certain parameter range satisfying

$\begin{equation} c_i + d_i I_\mathrm{DC} > u_0 + \frac{2}{\kappa} \; , \end{equation} \tag{ 5 }$

this reproduces the general nullcline shape, as well as the input-dependent firing rate, of biological neuron models [54]. The fast input is modelled:

$\begin{align} I_{\mathrm{AC},i}(t) = G_i I_{\mathrm{in}}(t) + I_{\mathrm{fb},i}(t) + \sum_{j \neq i} w_{ij}h(u_j(t) - \tau_{ij}) \end{align} \tag{ 6 }$

where $I_{\mathrm{fb},i}$ is sensory feedback and $I_\mathrm{in}$ is the external input, G_i is input sensitivity, and w_ij is the synaptic weight and τ_ij the threshold for a connection from neuron j to neuron i.

2.2. CPG and filter modules

The layout of the quadruped is shown in figure 1(A). This is a simplified version of the CPG layout in [15] containing one flexor-extensor pair per limb and several interneuron types. Our simplified model consists of three neurons per limb: one interneuron, a leg joint neuron (A) and a knee joint neuron (B). Each limb has identical parameters, and the connection weights are constrained to obey lateral symmetry. The ranges of the 23 neuron parameters and connection weights are given in supplementary table S1. For the CPG module, $G_i = 0$ and $\tau_{ij} = 0$ within the CPG. Depending on the connection weights, which could be excitatory or inhibitory, and the brain stem drive, the CPG can autonomously generate coordinated oscillations in the motor neurons. Flexible movement was targeted for this autonomous behaviour in the first stage of evolution (see section 2.4).

A subset of the evolved CPGs with high fitness had filter modules evolved on top (see section 2.5). The purpose of this layer is to preprocess and distribute descending rhythmic signals $I_\mathrm{in}(t)$ for the CPG to entrain its oscillations to. This module had only inhibitory connections, no lateral symmetry, no feedback and no brain-stem control ( $d_i = 0$ ). A non-zero threshold $\tau_{ij} = \tau_0$ was imposed between the filter outputs and CPG inputs so that the CPG received no input from the filter when $I_\mathrm{in}(t) = 0$ . The modules are unidirectionally linked by the weight matrix M, with a wider range than the within-module weights w_ij (see supplementary table S2).

2.3. Robot simulation

The quadruped robots were simulated in the Unity game engine on a flat planar surface. The full-scale robot was based on the specifications of the Open Dynamic Robot [55]. The 'short-legged' version was identical, apart from a 40% reduction in upper leg length and a 33% reduction in lower leg length. The controllers were written in Python [56] and interfaced with Unity using the Unity ML-agents package [57].

The CPG used a time interval of 8 ms while the physics simulation used a time interval of 20 ms, with decisions made every $\Delta t =$ 100 ms. At each decision point, the CPG sent joint positions based on the changes in rectified motor neuron outputs since the previous simulation step $\Delta h(u_A(t))$ and $\Delta h(u_B(t))$ :

$\begin{align} \theta_\mathrm{leg}(t) & = \theta_{0,\mathrm{leg}} \nonumber\\ & \quad \!+\! \theta_{\mathrm{lim,leg}} \left[2S\left(\frac{2A}{\theta_\mathrm{lim,leg}} \frac{\Delta h(u_A(t))}{\Delta t} \right)\!-\!1\right] \!+\! \theta_{C} \end{align} \tag{ 7 }$

$\begin{align} \theta_\mathrm{knee}(t) & = \theta_{0,\mathrm{knee}} \nonumber\\ & \quad+ \theta_{\mathrm{lim,knee}} \left[2S\left(\frac{2B}{\theta_\mathrm{lim,knee}} \frac{\Delta h(u_B(t))}{\Delta t} \right)-1\right] \end{align} \tag{ 8 }$

where S(x) is a logistic function, limiting the half-amplitude of motion of the upper leg and limb joints to $\theta_\mathrm{lim,leg}$ and $\theta_\mathrm{lim,leg}$ , respectively, both of which are set to 90^∘. The coefficients A and B, and the zero-angles are allowed to evolve, however with leg zero-angles $\theta_{0,\mathrm{leg}}$ always positive and $\theta_{0,\mathrm{knee}}$ always negative, corresponding to a full-elbow pose. The hip joint, perpendicular to the leg and knee joints, was kept at a constant but evolvable parameter $\theta_{0,\mathrm{hip}}$ . See supplementary table S3 for the ranges of these evolvable parameters. A control parameter θ_C was also added to the leg joint angle so that the forward position of the centre of mass could be controlled in real time (see next section).

At each decision point, the CPG also received inputs processed from the body's tilt, to use as stabilizing feedback inputs $I_{\mathrm{fb},i}$ . The sideways tilt (the sideways component of the unit vector normal to the top of the body) was input with opposite signs to the left and right limb motor neurons, with separate coefficients $q_\mathrm{A,side}$ and $q_\mathrm{B,side}$ for neurons A and B, respectively. Likewise, the front-back tilt (the upwards component of the unit vector normal to the front of the body) was input with opposite signs to the front and back limb inputs, with coefficients $q_\mathrm{A,front}$ and $q_\mathrm{B,front}$ . These four coefficients were also evolved along with the CPG. The entire set of 32 parameters for the CPG and body was encoded as a sequence of integers, each taking a value between 1 and 10.

2.4. CPG evolution

We used the NSGA3 algorithm [26] in the DEAP Python package [58] to perform multi-objective optimization. This genetic algorithm preferentially selects non-dominated individuals (i.e. those on the Pareto front) for propagation to the next generation. Parameters used for the NSGA3 algorithm are given in supplementary table S4.

Unlike in [46] where each CPG was iteratively evaluated to explicitly select for change in period as a function of the brain stem drive $I_\mathrm{DC}$ , we instead swept $I_\mathrm{DC}$ during a single evaluation, and partially selected for variability in speed. This was to reduce the computational load of running several evaluations with constant parameters as is required for reliable period estimation, and to ensure stable walking over a wide region of parameter space. In addition, we adjusted the centre of mass parameter θ_C that is added to the leg standing angle during the evaluation, in order to select for flexibility of movement direction. Measurements of forward and perpendicular distance covered (y_j and x_j respectively) were made over three stages (j = 1 to j = 3) of length 10 s each, as shown in figure 2(A).

**Figure 2.** CPG evaluation and evolution. (A) Sweeping of control parameters $I_\mathrm{DC}$ and θ_C over the course of a single evaluation. The CPG undergoes a 'burn-in' period of 8 s prior to the t = 0 mark. For the first 2 s of the simulation, the actuators are ramped from zero to the full CPG output. The arrows above show the measured fitnesses and targeted behaviours during each stage. (B) Maximum fitnesses vs generation. Solid lines: normal quadruped; dotted lines: short-legged variant. Each line is the median over five replicates. Shaded areas are the range over the five replicates for the normal quadruped. Areas inside the box on the right side of the plot are the final ranges of maximum fitness for the short-legged variant.
Download figure:
Standard image High-resolution image

We used four fitness functions to simultaneously select for desired capabilities of the robot. The first three correspond to the targeted behaviours for each stage of the evaluation (fast backward motion, steady forward motion and fast forward motion, respectively), while the fourth selects for overall stability. The fitnesses are given as:

$\begin{align} F_1 & = -y_1 - \left(\frac{x_1}{x_0}\right)^2 \qquad \quad \end{align} \tag{ 9 }$

$\begin{align} F_2 & = 2y_0 y_2 - y_2^2 - \left(\frac{x_1}{x_0}\right)^2\;\; \end{align} \tag{ 10 }$

$\begin{align} F_3 & = y_3 - \left(\frac{x_1}{x_0}\right)^2 \qquad \quad \;\;\; \end{align} \tag{ 11 }$

$\begin{align} F_4 & = \frac{y_0^2 H_\mathrm{tot}}{1 + t_\mathrm{tot}} \end{align} \tag{ 12 }$

where $x_0 = \sqrt{5}$ m is a parameter to punish sideways movement, $y_0 = 2.5$ m is an optimal forward distance for steady motion, defining the maximum F₂ and F₄, $H_\mathrm{tot}$ is the mean height over the entire evaluation (normalized for a maximum of ≈1), and $t_\mathrm{tot}$ is the root mean square body tilt (mean length of the cross product $\hat{\boldsymbol{n}} \times \hat{\boldsymbol{g}}$ where $\hat{\boldsymbol{n}}$ is the unit vector normal to the top of the robot and $\hat{\boldsymbol{g}}$ is the unit vector normal to the ground).

Note that F₁ and F₃ are unbounded and identical but with opposite signs for the forward distance. F₂, meanwhile, is a quadratic function that is positive only between $y_2 = 0$ and $y_2 = 2y_0$ . To optimize all four fitness functions simultaneously, the robot must first walk backwards with a negative centre of mass parameter, then walk forwards 2.5 m with a positive centre of mass parameter and brain stem input of $I_\mathrm{DC} = 0.5$ , and then accelerate with $I_\mathrm{DC}$ increasing.

During the evolution, the evaluation was run three times per individual with random initial u_i values, and the median of each fitness was taken. For each morphology, 5 independent populations of 168 individuals (8 individuals per edge of the reference Pareto front) were evolved for 200 generations. The final populations were then evaluated 15 times with different random number generator seeds and the median fitnesses were calculated again, as well as cross-correlations between limbs, and oscillation periods from the largest autocorrelation peak (see supplementary material).

2.5. Filter evolution

In order to reduce the total number of evolutions for the filter layer, a subset of CPGs was chosen from each population's Pareto front in order to capture a numerically small but diverse range of solutions. Each population was first reduced to a set of CPGs for which all fitnesses were positive, and then four were chosen from each using the maxima of four weighted fitness functions $F_m^*$ . These are defined as a combination of F_m and the total sum of fitnesses:

$\begin{equation} F_m^* = zF_m + \sum_{k = 1}^4 F_k \; , \end{equation} \tag{ 13 }$

where z was incremented in intervals of one until the maximum of each $F_m^*$ was unique.

For each of these CPGs, a 6-neuron, a filter module was evolved using the NSGA3 algorithm. The parameters consisted of 6 input weights, 24 output weights, 30 inhibitory connection weights within the layer, a shared bias term c_i , and the time constant of a low-pass filter for the initial input.

The filter evolution comprised two stages of increasing complexity. The input consisted of evenly spaced impulses with every fourth impulse missing, over a total of 40 s. For the first 50 generations, the timings had no noise, in order to facilitate the random generation of suitable filters. After the 50th generation, a random timing offset (Gaussian distributed with a standard deviation of 2% of the period) was applied to each impulse's timing. Evaluations were made with constant control parameters $\theta_C = 0$ and $I_\mathrm{DC} = 0.5$ .

Three input periods were used for each evaluation: $T_0/\phi$ , T₀ and $\phi T_0$ , where T₀ is the CPG period at $\theta_C = 0$ and $I_\mathrm{DC} = 0.5$ , and φ = 0.618. The latter was chosen so that $1/\phi \approx 1+\phi$ and hence the low-period and high-period inputs are equidistant from an integer multiple of T₀.

The fitness function for each input period $T_{\mathrm{in},k}$ , with measured walking period $T_{\mathrm{out},k}$ , was calculated as

$\begin{equation} F_{fk} = H_{\mathrm{tot},k} Q_k \end{equation} \tag{ 14 }$

where

$\begin{align} Q_k = \left(1 + \frac{1}{\epsilon} \left|\frac{2T_{\mathrm{out},k}}{T_{\mathrm{in},k}}-\left[\frac{2T_{\mathrm{out},k}}{T_{\mathrm{in},k}}\right]\right| + \frac{\sigma_0}{\sigma_t}\right)^{-1} \; , \nonumber\\ \end{align} \tag{ 15 }$

[.] indicates rounding to the nearest integer, σ₀ is the mean standard deviation of the filter output with no input, and σ_t and ε are scaling thresholds, both set to 0.1 for the current study. Hence, the fitness is maximized for upright walking with a period of a half-integer or integer multiple of the input period.

The filter evolution used a population of 92 individuals (12 individuals per edge of the reference Pareto front) for 150 generations. At the final generation, the population was evaluated five times and the median fitnesses were calculated. This final evaluation included two additional periods at $T_0/\sqrt{\phi}$ and $\sqrt{\phi} T_0$ . The filter with the highest $\sum_k Q_k$ was then chosen for each CPG, conditional on each $H_{\mathrm{tot},k}$ being above 0.75.

3. Results

3.1. CPG evolution

From the CPG evolution, 710 unique individuals were produced. As shown in figure 2(B), the fitnesses with upper limits (F₂ and F₄) reached these within a few generations, while the others reached a plateau close to the 200 generation mark.

3.2. Gait characteristics

After filtering out CPGs with an average height of ${\lt} 0.75$ (below which robots were typically judged to be crawling rather than walking, see supplementary figure S1), CPGs were classified into gait types. Walking gaits were defined as those with maximum inter-limb correlation $\lt$ 0.3; above this threshold, trotting gaits were defined as those with diagonally opposite limbs maximally correlated, pacing gaits had left or right leg pairs maximally correlated, and bounding gaits were defined as those with front or back limbs maximally correlated. Of the 420 non-crawling individuals at $I_\mathrm{DC} = 0.5,\theta_C = 0.016$ , 19.8% were classed as walking, 73.8% as trotting, 2.4% as pacing and 4.0% as bounding. The longer legged robot was more likely to develop a walking gait (29% vs 6%) or bound gait (7% vs 0%), while the short-legged variant was more likely to develop a pacing gait (6% vs 0%).

3.3. Predictors of F3

Many CPGs (14%) were able to have all-positive fitnesses, which means that they could walk backwards and then forwards at a controlled speed as the leg standing angle θ_C is switched from negative to positive. Of particular interest for entrainment is F₃, the target for acceleration with increasing $I_\mathrm{DC}$ . This was found to be significantly larger on average in walking and pacing gaits compared to trotting and bounding, contrary to the typical order of quadruped gaits (see figure 3(A)).

**Figure 3.** F₃ as a function of morphology and CPG properties, for all upright robots. The box plot (A) shows the distributions of F₃ sorted by morphology and gait types. Numbers in parentheses indicate the number of each gait type in the population. In panel (B) F₃ is plotted as a marker colour against the oscillation period and maximum interlimb correlation of each CPG. For both plots, CPGs are evaluated at $I_\mathrm{DC} = 0.5, \theta_C = 0.016$ . Panel (C) shows images from the simulation for example individuals displaying each of the gaits and body combinations in panel (A).
Download figure:
Standard image High-resolution image

**Figure 3.** F₃ as a function of morphology and CPG properties, for all upright robots. The box plot (A) shows the distributions of F₃ sorted by morphology and gait types. Numbers in parentheses indicate the number of each gait type in the population. In panel (B) F₃ is plotted as a marker colour against the oscillation period and maximum interlimb correlation of each CPG. For both plots, CPGs are evaluated at $I_\mathrm{DC} = 0.5, \theta_C = 0.016$ . Panel (C) shows images from the simulation for example individuals displaying each of the gaits and body combinations in panel (A).
Download figure:
Standard image High-resolution image

We examined whether F₃ selected for period or gait flexibility as predicted. For both morphologies, period and F₃ had a non-monotonic relationship with interlimb correlation at $I_\mathrm{DC} = 0.5$ , as shown by figure 3(B). A clear correlation was seen, however, between F₃ and the change in a CPG's inter-limb correlation as the brain-stem drive was changed from $I_\mathrm{DC} = 0.5$ to $I_\mathrm{DC} = 1.0$ . This was shown by a linear mixed-effect model with sums and differences of the periods and inter-limb correlations as fixed effects, and replicate as a random effect (long-legged: $z = -2.17, P = 0.03$ ; short-legged: $z = -3.26, P = 0.001$ , see supplementary figure S3). This indicates that acceleration leading to a high F₃ could be achieved by moving from a correlated trotting gait towards a more efficient walk-like gait. In addition, for the shorter legged morphology, a period that shortens with increasing $I_\mathrm{DC}$ is associated with higher F₃ score ( $z = -2.72, P = 0.007$ , see supplementary material).

3.4. Direction and speed tuning

The trained robots typically show regions of smooth change of speed and direction within the space of control parameters. These regions often extend outside the region of the control parameter sweeps. Two examples are shown in figure 4. For the CPG that maximized the average CPG evolution fitnesses (equations (9)–(12)), the brain stem drive $I_\mathrm{DC}$ has relatively little effect on the movement characteristics. This illustrates that high average fitness is itself not a guarantee of general flexibility. Further outside the region of the swept control parameters, the behaviour becomes more unpredictable. For example, the low measured height in figure 4(A) for low drive parameter and negative θ_C indicates an instability that coincides with a gait transition, as shown by an abrupt change in inter-limb correlation in the same region.

**Figure 4.** Properties of the CPG with (A) the highest average fitness and (B) the highest negative change in period with $I_\mathrm{DC}$ as a function of the two control parameters when both are held constant. The black line indicates the range of parameters swept during the evolution. Corr.: maximum inter-limb correlation coefficient.
Download figure:
Standard image High-resolution image

3.5. Filter evolution

Each morphology had 19 CPGs chosen for filter evolution out of the planned 20. One long-legged replicate only produced three unique CPGs from equation (13), while one selected CPG from the short-legged replicates had no measurable walking period, so an input period could not be determined.

The evolution of the filter module is shown in figure 5(A). The short-legged morphology converged to the maximum fitness sooner than the long-legged morphology. In general, it was more difficult to entrain to a rhythmic input shorter than the natural walking period, compared to a longer period input.

**Figure 5.** Filter evolution. Panel (A) shows the maximum fitnesses F_fk vs generation for the three input periods. Solid lines: normal quadruped; dotted lines: short-legged variant. Each line is the median over 19 CPGs used as the basis. Shaded areas are the interquartile range over the 19 CPGs for the normal quadruped. Areas inside the box on the right side of the plot are the final interquartile ranges of maximum fitness for the short-legged variant. Panel (B) shows the total entrainment ability score $\sum_{k = 1}^3 Q_k/3$ for all robots over the height threshold for each k, as a function of period flexibility (gradient in period vs brain stem drive, with $\Delta I_\mathrm{DC} = 0.2$ ). The dotted line is the best fit from the linear regression.
Download figure:
Standard image High-resolution image

3.6. Predictors of entrainment

When taking the highest eligible mean of the entrainment performance Q_k , a correlation was found between the period tunability and entrainment performance (linear model: $t = -2.26,P = 0.03$ , see figure 5(B)). Faster oscillation with increasing $I_\mathrm{DC}$ therefore facilitates entrainment, while faster oscillation with decreasing $I_\mathrm{DC}$ appears to inhibit this ability. For the highest fitness filter and CPG combinations, entrainment could be generalized to stimulus periods other than those used during evolution, as shown in figure 6. Adjustment to the stimulus turning on or off typically occurred within a few motion cycles. To show this, time series for the leg joint angles were convolved with a Morlet wavelet at the input period, with a resolution parameter σ = 1.5, and then a Gaussian filter was applied with a width of 0.5 s. Interestingly, the robot could generate a response that created a polyrhythm with the input (namely, two steps for every three impulses), as shown in figure 6(A).

**Figure 6.** Entrainment to an isochronous stimulus at (A) 80% and (B) 125% of the natural period, respectively, for the CPG with highest negative change in period with $I_\mathrm{DC}$ . The stimulus is started at the 8 s mark, and ends at the 16 s mark. Black ticks show the impulse times for the stimulus, and the peaks of the leg output below, corresponding to the extent of the forward swings in radians. Sync: output from the smoothed Morlet wavelet convolution, normalized to a maximum of one.
Download figure:
Standard image High-resolution image

4. Discussion

Our highly nonlinear, bio-inspired CPG combined with multi-objective optimization was successful in both generating a variety of gait profiles and properties, as well as flexibility in these gaits for a large subset of individuals. This was despite the fact that the fitness functions did not straightforwardly translate to specific gait properties.

Trots were the most favoured gait type for both morphologies, seemingly due to this gait's ability to transition from backwards to forwards motion. The short-legged robot was more predictable, as evidenced by its smaller range of fitnesses for each gait type, and evolved faster in the case of the filter layer. Hence, it can be used as a starting point for building complex behaviour in stages [59].

Notably, the gait frequency emerged in a self-organized fashion from the interactions in the CPG network and—due to the added nonlinearities of the neural model—was also sensitive to inputs. Oscillation periods could therefore be tuned both manually via the brainstem drive parameter, and automatically via spontaneous entrainment to fluctuating sensory input. Due to the fully self-organized nature of the entrainment, this occurred much more rapidly than feedback-based approaches [5, 42, 60]. We expect similar results if utilizing other neuron models with input-dependent frequency, such as the Fitzhugh–Nagumo model [61] or the Rowat–Selverston model [36]. Phase synchronization to stimulus may be achieved in the future by adding feedback from reaction forces, such as with the Tegotae approach [62].

The specific results in this article can also be used to further direct the evolution of desired behaviours. For example, entrainment appears to benefit from a period that decreases with input, as occurs in single biological neurons. Restricting parameters to ensure this can increase the speed of evolution and the likelihood of entrainment. While the fitness targeting fast forward movement was significantly correlated with period flexibility, the period ranges exhibited were not as wide as when directly using the latter explicitly as a fitness function, as was done in [46]. Therefore, a combined approach where a disembodied CPG is first evolved may be beneficial in future work.

Rhythmic entrainment is a complex behaviour seen only in very few species [63], and is thought to be an integral part of the evolution of human social behaviour. Our CPG network mediated by a filter layer successfully captures this important example of cortical shaping of cyclical movements. Our heavily bio-inspired approach offers a path towards testing theories of human cognitive processes, such as beat perception, that are still not well understood [64, 65]. The results we present show that dynamic attending theory, based on synchronization of endogenous rhythms [66, 67], is a viable explanation for beat perception also when involving an entire distributed sensorimotor system.

On the engineering side, this approach is highly relevant to the current push towards adaptive robot behaviour [2, 11, 23]. In this study, the filter layer was optimized by a genetic algorithm. However, by instead adding Hebbian plasticity [68], reinforcement learning [34] or another mechanism for longer-term adaptation, it may be possible for robots to learn suitable new movement patterns through repeated imitation of robot or human demonstrators in an unsupervised manner, and hence develop useful behaviours autonomously [69].

Future work will implement physical robots with performance metrics addressing movement efficiency and stability in different physical environments. We will also explore mutual adaptation of multiple robots by using foot sensors to transmit impulses to neighbours. Notably, the fact that the communication medium is a simple time series means that co-ordination can occur across differing morphologies. This is also likely to increase the complexity of the behaviours, as is typical in studies of collective motion [70], and may allow for creative uses, such as human-robot musical ensembles [71].

Acknowledgments

This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie Grant Agreement No. 101030688, and is partially supported by the Research Council of Norway through its Centres of Excellence scheme, Project Number 262762. The authors would also like to thank Caroline Palmer and Anne Danielsen for helpful discussions.

Data availability statement

The data that support the findings of this study are openly available at the following URL: https://github.com/aszorko/COROBOREES/tree/Paper2.

Central pattern generators evolved for real-time adaptation to rhythmic stimuli

Article metrics

Submit

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction