A Cerebellum-Inspired Learning Approach for Adaptive and Anticipatory Control

The cerebellum, which is responsible for motor control and learning, has been suggested to act as a Smith predictor for compensation of time-delays by means of internal forward models. However, insights about how forward model predictions are integrated in the Smith predictor have not yet been unveiled. To fill this gap, a novel bio-inspired modular control architecture that merges a recurrent cerebellar-like loop for adaptive control and a Smith predictor controller is proposed. The goal is to provide accurate anticipatory corrections to the generation of the motor commands in spite of sensory delays and to validate the robustness of the proposed control method to input and physical dynamic changes. The outcome of the proposed architecture with other two control schemes that do not include the Smith control strategy or the cerebellar-like corrections are compared. The results obtained on four sets of experiments confirm that the cerebellum-like circuit provides more effective corrections when only the Smith strategy is adopted and that minor tuning in the parameters, fast adaptation and reproducible configuration are enabled.


Introduction
Human beings show highly skilled motor responses in spite of their complex musculoskeletal dynamics, transport delays and response latencies. A reactive feedback system, originating in the brain-stem and spinal cord, and a feed-forward anticipatory system, originating in the cerebellum, allow fine motor control of the body. 1 The cerebellum has a complex neural organization that gives rise to a massive signal-processing capability for accomplishing several types of motor learning. 2 The cerebellum's role in motor function, i.e. its contribution to the execution of coordinated, and precise complex actions, is well recognized. 3 It has been widely accepted that the cerebellum stores internal models to represent input-output properties of a body part. [4][5][6] Two types of internal models have been proposed: forward and inverse models. 4,7 This paper focuses on forward models. In his seminal paper on cerebellum and internal models, Ebner 5 affirmed: "Forward models use the commands for an action and information about the present state to predict the consequences of that action." More recent evidence 8 hypothesized that the cerebellum may use the motor command or "efference copy" signal sent to an effector to overcome the problems associated with long delays in sensory feedback.
Considering the structure of the control system for voluntary movements, Ito 9 depicted a plausible control path of the connections between the cerebral cortex and the cerebellum that has anatomical and physiological bases: the forward model-based control loop. Miall 10 suggested that the cerebellum could act as a Smith predictor 11 for compensation of time delays that occur in the sensorimotor transmission and neural central processing by means of forward models. As a matter of fact, the engineering control strategy known as Smith predictor, which is a type of predictive controller for systems with pure time delay, is based on the forward model of the robot plant; it provides a fast prediction of the outcome of a motor command and a delayed copy of that prediction, which will match in time the actual feedback arising from the movement. However, no study has confirmed Miall's hypothesis; no control loop merging a computational model of the cerebellum and an internal forward model acting as a Smith predictor has ever been presented. Twenty years later, Porrill et al. 12 argued Miall's hypothesis appeared to be in principle biologically implausible because of the constraint of having a suitable teaching signal. Nonetheless, they also affirmed: "However, even for plausible schemes, such as forward models for noise cancellation and novelty-detection, and the recurrent architecture for adaptive inverse control, there is unlikely to be a simple mapping between microzone function and internal model structure". 12 In contrast with the previous statements, we focused our investigation on conceiving a plausible mapping by integrating the Smith predictor strategy into the biological recurrent control loop 13 while preserving the intrinsic and suitable teaching signal.
In this study, a methodology for integrating the Smith predictor mechanism with a cerebellar-like network has been devised. The former includes the forward model-based learning for providing sensory predictions to the second one that automatically supplies corrective control signals making the control adaptive. Any biological system has an inherent delay from when a command has been issued to a muscle to when the response of the limb has been received. In this work, the delays are the ones that arise from the communication system functioning and the motors feedback. The main objective of this work is to gain insights into the sensory signal processing in motor control by answering the following questions: (i) In the proposed architecture, what is the role and connectivity of the forward cerebellar model? (ii) How are the sensory signals distributed in order to drive the learning and corrections in combination of the Smith predictor and the cerebellar-like network?
The forward model and the cerebellar-like microcircuit were reproduced by combining machine learning and computational neuroscience techniques. The control scheme was then tested on a real physical platform, the Fable modular robot, 14 combining basic modules that led to a mutual influence in the joint dynamics. In this context, a real modular robot represents a suitable benchmark platform to investigate the role of the cerebellum in anticipatory control and the relation with its internal modularity. We propose a cerebellar-based modular controller that is capable of effectively reducing the errors in few iterations of the same task for both a single and double robot module configuration.

Background
Since the late 1960s, several authors, such as Marr (1969), 15 Albus (1971), 16 Fujita (1982), 17 shared their view of the cerebellum as an adaptive controller. The use of cerebellar models for robotic control has become an active research for developing accurate low-gain control schemes for robotic platforms of several degrees of freedom and compliant joints (see review on cerebellar control 18 ). For example, these models have been used in robotic manipulation tasks, where the cerebellum abstracts dynamic models of the robotic platform and provides adaptive feed-forward corrections to the imprecise control signal obtained from an inverse dynamics controller 19,20 or feedback error learning controller. 21 Several works have been carried out on the cerebellar internal model theory, [22][23][24][25][26] and two main control alternatives have been advocated, which are known as feed-forward, 21,27-30 and recurrent control loops; 13,20,26,31,32 a comparison among the two schemes is also available in the literature. 33 Biologically plausible cerebellar algorithms have been successfully embedded in the aforementioned loops for validating the cerebellar adaptive functions in different tasks, e.g. in arm manipulation adaptation under perturbations, 19,21,30,34,35 in eyeblink classical conditioning, 36,37 in gaze stabilization through the vestibulo-ocular reflex with a humanoid robot, 29,38 in gain and timing adaptation for a target reaching task, 39,40 in minimizing sensory discrepancy and cancelling the noise. 31 Although these cerebellar-like circuits differ in the synaptic plasticity models or network models, they all analyzed how the cerebellum is able to adjust the motor output or to predict the action and minimize the sensory discrepancy.
The recurrent cerebellar loop, based on the use of the forward model, was originally proposed by Porrill et al. 31 to simplify the adaptive control of nonlinear and redundant biological motor systems; in their experiments, the cerebellum had to compensate for miscalibrations in a two deegrees of freedom (DOFs) planar arm. Another attempt to show how the recurrent adaptive control algorithm can successfully improve the performance of a robotic implementation for gaze stabilization was made by Ref. 32. They came to the conclusion that the architecture was able to adapt to a time varying plant and so, it could be employed in general engineering applications where sensory feedback is very noisy and/or delayed. Luque et al., in 2011, 20 were one of the first authors who evaluated the capability of a spiking cerebellar model embedded in the recurrent loop architecture to control a robotic arm of three degrees of freedom. Two years later, Tolu et al., 13 proposed a new bio-inspired recurrent architecture in which the feedback-error-learning 41 and adaptive predictive control strategies were combined. Both of the previous studies reported the advantage of employing the recurrent architecture to ensure a fast convergence in learned profile curve dynamics. In contrast with Ref. 20, Tolu et al. 13 guaranteed the system capability to be robust against both dynamics and kinematics transformations even in nonlinear plants with seven DOFs. Although of significant importance, this study did not fully exploit the potential of the forward model predictions, as they were just used to provide an input-space representation to the cerebellum-like network and not for improving the corrections based on the prediction error.

Paper outline
Section 2 gives an explanation of the adopted control strategy and a description of each main component with the equations included. In this regard, the properties and characteristics of the machine learning engine, of the cerebellar microcircuit, and of their connection are presented and analyzed. Section 3 begins by outlining the four sets of experiments that include highly dynamic tasks under varying modular set-ups. Finally, in Secs. 4 and 5, we discuss the performance and the robustness of the proposed control method in learning forward models and compensating to deviations in the target trajectory together with concluding remarks.

The cerebellar-like adaptive smith predictor
The proposed control strategy is illustrated in Fig. 1(c) and it will be referred as the cerebellar-like adaptive Smith predictor in the rest of this paper. A set of four main blocks is created for each active robot module and each block is also internally split up in terms of DOFs. The blocks are: the trajectory 1950028-3 planner, the controller, the Unit Learning Machine (ULM), and the robot plant. Two previous control strategies presented in the literature have been combined. The first one is shown in Fig. 1(a) and it is known as the recurrent adaptive cerebellar-based control loop (RAFEL). 13 In RAFEL, inside the ULM or cerebellar-like microcomplex, the Forward Model plays the role of the granular cerebellar layer that sends the W GK preprocessed weights (K is the number of receptive fields (RFs)) to the Readout Plasticity Units (RPUs) sub-component that includes the Purkinje cerebellar layers. The cerebellar corrective output (q c ) of the RPUs is summed to the feedback error (e fb ) into a total error signal (e tot ) that is converted by the controller into a τ torque command to the joint actuators. The second control strategy is the Smith-based control mechanism that is shown in Fig. 1(b). It is based on a fast internal feedback loop (Q pr (t) forward model predictions) and on an external feedback loop in which errors are converted by the controller into a τ torque command to the joint actuators. There are no cerebellar-like corrections acting in the loop.
Further details about each block in terms of learning and prediction mechanisms, processing flow, and interconnections, are given in the following subsections.

The control loop description
The trajectory planner delivers the reference or Q r,j (t) desired joint positions for the Fable robot to follow. Ideally, if there were no noise, disturbances, or delays during the functioning of the system, the desired positions and velocities and the actual outcomes from the robot plant would match thanks to the torque control command provided by the controller. However, a correction that takes into account if any mismatch occurred in the robot plant output, has to be added to the desired movements and the sum is converted into a τ motor command signal through the controller. A delayed copy of the Q pr (t − δ) forward model prediction is compared with the Q(t − δ) previous sensory feedback in output from the robot plant. The result of this comparison is fed into the system to correct the movement if necessary. The equations presented below are referred to position contributions. Velocity contributions can be derived in a similar way, but they have not been explicit for clarity.
In Fig. 1(c), the controller receives e tot,j (t) control input i.e. the sum of the following contributions (1) where e tpr,j (t) is the trajectory joint error, e pr,j (t) is the prediction joint error, and q c,j (t) is the deepcerebellar nuclei (DCN) cell behavior or cerebellar joint corrective position computed as in Eq. (8). Each term is associated to the j th joint. The τ (t) torque command for the j th joint is obtained by the following equation: The control signal is thus a sum of three terms: a first term that is proportional to the position error, a second term that is proportional to the velocity error, and a third term that is proportional to the derivative of the velocity term. The controller parameters for each joint j are: the proportional gain k p , the derivative gain k d , the integral gain k i , and the natural frequency w j that determines the speed of response of joint j. The parameter values are shown in Table 1 for each joint and they were intentionally tuned to make the controller initially stable, but not optimal and robust to disturbances, during the execution of the task. Indeed, the purpose of the work is to show how the RPUs adaptive module and the Smith predictor strategy are together contributing to the improvement of the control in an adaptive way. In the work done, both joint velocity and position corrections have been taken Table 1. The proportional (kp), derivative (k d ), and integral (k i ) gains for joints 1 and 2. The natural frequency w 1 (joint 1) is equal to 4.5, and w 2 (joint 2) is equal to 5.0.
into account for adaptive control. Equations in the text refer to the position terms, but a similar analysis could be derived for the velocity terms. Here, they are omitted for clarity.

The unit learning machine
The ULM is a set of structured cerebellar neural circuits that encodes internal models in order to precisely perform the control of the body part. 42 Forward models are formed and adjusted through a supervised learning process as the movement is repeated to mimic the behavior of a natural process, 43 to facilitate more precise coordinated movements 44 and to increase the system control compliance. 45 The main advantage of this approach is to allow the design of control systems that can operate in unknown or changing environments, when the dynamical robot model is unknown 13,21 or every time a subject learns to use a new tool or uses different tools. 23 The network of ULMs is inspired on the cerebellar modular neural organization proposed by Ref. 46; the modularity of the ULMs is achieved by replicating them in terms of the number of robot modules (N ) with an identical parameter configuration; one set of ULMs is dedicated to position terms and another one to velocity terms, but this is not explicitly depicted in Fig. 1 for clarity. Each ULM artificially reproduces the main cerebellar layers of the canonical microcomplex circuit shown in Fig. 2 that were grouped in two sub-blocks: the Forward Model and RPUs. The Forward Model and RPUs components that are dedicated to position learning are separated from the ones for velocity learning. The Forward Model sub-block is dedicated to learn the internal forward model of a single robot module, but it receives the sensory input data from every robot module that is active and connected. If another robot module is connected to the active ones, an extra ULM is created and all the ULMs start receiving data also from it. The RPUs output inside the ULM block is the adaptive cerebellar correction for each robot module as represented in Figs. 1(a)-1(c).  MFs that originate from the spinal cord and from a wide range of nuclei in the brain stem, namely PN, distribute multisensory information, which refers to the desired movement, motor commands, and the actual state of the limbs, onto different GCs. GCs excite the main output of the cerebellar cortex, the Purkinje cells (PCs) via parallel fibers (PF)-PC synapses. The other afferent of the cerebellum, the CF, arise from the IO and is thought to convey an error computed during the movement, thus providing the teaching signal for the cerebellar learning.
Other cell types such as the Golgi, the basket, and the stellate cells are not depicted in the figure for simplicity.
of RFs. 13 The RFs are generated online along the execution of the trajectory. The LWPR algorithm 47 accomplishes the learning of the forward dynamic model of a robot module, while it is subjected to corrections during the reaching task. 13 So, the LWPR engine is running in the Forward Model sub-block inside the ULM with multiple functions: (1) it learns the forward model of the robot; (2) it predicts the Q pr (t) future state of the robot module; (3) it computes the W Gk granular weights from the sensorimotor information coming through the MFs and it provides them to the RPUs.
Among the global nonlinear function approximators, such as Gaussian Process Regression (GPR), or Support Vector Regression (SVR), LWPR method has been selected since it approximates nonlinear functions in high-dimensional spaces at a low computational cost. It automatically and efficiently evaluates the required number of local correlation modules to optimize the network size by incremental learning, the Partial Least Squares regression (PLS). Because of this, the LWPR algorithm does not require any prior knowledge of the data, thus allowing fast online learning. Moreover, there is no restriction in the number of different tasks that can be retained and interposition among them. All the previous facilitate the cerebellar learning accuracy and speed, thus confirming the hypothesis of Ref. 31 that they could be improved by an optimized granular sparse code, i.e. a neural code in which the ratio of active neurons is low at any time. Furthermore, more recent studies on cerebellar physiology link the PCs responses necessary for motor learning to learning-related neural responses in granular cells. Thus, the learning capability of the cerebellum is highly correlated to the changes in the granular layer during motor control. 8 Bearing in mind the similarity between the LWPR learning mechanism and the cerebellar circuit (see Fig. 1 in Ref. 21), each LWPR module and its associated RF weights can be seen as providing the firing rate of a PF, while the set of active RF weights can be seen as the current state of the granular layerprocessing module.
From Fig. 1(c), the Q pr (t) outputs of the Forward Model sub-block represent the predicted future states or positions of the robot and they are used for addressing a fast internal feedback control. A schematic representation of the input/output signals to the learning and prediction functions of the LWPR algorithm is depicted in Fig. 3. The learning function is trained to acquire the forward model of the robot module. The predict function is called to estimate the future state Q prM (t) of the robot module M based on the previous Q(t − δ),Q(tδ)) robot plant state, the previous Q r (t − δ),Q r (t − δ)) desired robot plant state, and the previous τ (t − δ) efferent command copy sent to the robot, where δ is the inherent delay of the control of the system.
Besides, the LWPR incrementally divides the input space into a set of K RFs whose W Gk (t) timevarying parameters are defined in (3) by the c k center and a Gaussian area characterized by a positive definite D k distance matrix where x i (t) are the input data points at sample t (i ∈ [1, M], M is the total number of inputs), and k is a local model or RFs (k =∈ [1, K], K is the total number of RFs). At each sample t, the x i (t) points are assigned to the closest RF based on its W Gk (t) weight activation, and consequently, the c k center is incrementally updated. The number of local models increases with the complexity of the input space. If an input data falls into the validity region of a local model, its own distance matrix and regression parameters are updated. Likewise in the cerebellum processing flow, the W Gk (t) RF weights (3) are processed by the RPUs sub-component. The Q pr,j (t) global forward model output for the j th joint of a robot module is computed as the weighted mean of all theq k (t) predictions of the k th linear local models created (4).
In other words, the input data points x i (t) enters every RF (local model), which provides theq k (t) prediction seen in (4).
The Readout Plasticity Units. Inside the RPUs the following cerebellar layers are represented: • Purkinje cell (PC) layer: the activity at this layer is function of the granular layer state. The learned parameters are modified during the execution of the planned movement by the error or teaching signal. • DCN cells: The response is related to the integration of both excitatory and inhibitory contribution from MFs, inferior olive (IO) and PCs, respectively.
Each PC and DCN is associated with a joint of the robot. Several forms of plasticity mechanisms work in balance at different layers to provide a final corrective control signal to perform the desired robotic task. More in detail, the RPUs set provides the q c (t) cerebellar corrections, where t states the current sample. These corrections are modulated through the e fb (t) teaching signal computed by the difference between the Q r (t) reference joint positions and the Q(t) actual joint positions of the robot plant, obtained from the encoders of the motors. The dimensionality of the previous-mentioned vectors is where n is number of DOFs of each robot module. Unlike the work in Ref. 13, the here adopted cerebellar-like model was inspired from the biological cerebellar micro-complex circuit shown in Fig. 2 that includes extra plasticity rules based on the cerebellar theories described in Ref. 48. Thus, inside the RPUs sub-block, the PF-PC, PC-DCN, MF-DCN and IO-DCN synaptic connections are implemented. Our main aim is to maintain the functional cerebellar information processing by using artificial cells with analog activity values. In the context of movement control, several studies, 25,42,48,49 suggested that the CFs that arise in the IO, may signal the presence of an unexpected sensory stimulus during an action, which nature depends on the type of the internal model, so they may convey the error signals that drive a long-term depression (LTD)-based learning process at parallel-Purkinje (PF-PC) synapses. In the case of forward model learning, the CFs may convey either sensory signals. Apart from the PF-PC synaptic plasticity, the plasticity at the DCN synapses can account for learning. 19 DCN neurons are innervated with both excitatory and inhibitory connections: the first from CFs and MFs, and the second from PCs. Their relationships are still not completely revealed, but it seems that the IO-DCN plasticity provides fast corrections at early stages during the learning. 48 The W Gkj (t) weights are then processed for computing the q cj (t) cerebellar corrections on the basis of the e fb j (t) feedback error or teaching signal carried out by the CF from the IO associated with the j th joint. Thus, the specific PF kj (t) pathway to the Purkinje layer carries the W Gk j (t) signal to the PC j (t) synapse. The PC j (t) output represents the firing rate of the PCs associated with the j th joint and it is modeled as a weighted linear combination of the W Gk j (t) where Δw PFk−PCj (t) represents the weight change between the k th PF and the target PC associated with the j th joint, β is a small positive learning rate set to 5 · 10 −3 (7 · 10 −4 for the velocity configuration) in all the experiments, CF j (t) signal corresponds to the e fb j (t) teaching signal. According to Fig. 2, the DCN receives input signals from two differentiated pathways. The first pathway comes from the cerebellar cortex trough PCs and the second is through MFs and CFs. The roles of the aforementioned pathways in motor control were discussed in Ref. 48. Equation (8) describes the DCN cell behavior where DCN j (t) represents the firing rate of the DCN cell associated to the j th robotic joint; W MF−DCNj is the synaptic strength of the MF-DCN connection at the joint j th ; W PC−DCNj represents the synaptic strength of the PC-DCN connection of the j th joint; CF j (t) corresponds to the e fb j (t) teaching signal associated to the j th joint and W IO−DCNj represents the synaptic strength of the IO-DCN connection of the j th joint. The sensory information is then related with the corrective cerebellar output DCN j (t). We have integrated in our previous cerebellar-like circuit 13 the rules described in Eqs. (9), (10) and (11) that represent the W PC−DCNj , W MF−DCNj , and W IO−DCNj synaptic plasticities of the j th joint. However, some changes related to the parameters and signals suggested in Ref. 48 for the robotic simulations were applied in the implementation with the real robot. All the synaptic strengths are progressively adapted during the learning process according to different synaptic plasticity mechanisms which are represented by the following equations: where LTP MAX and LTD MAX are the maximum long-term potentiation/long-term depression values set to 1.0 · 10 −4 and 1.0 · 10 −3 , respectively (1.0 · 10 −5 and 1.0 · 10 −4 for joint velocity configuration); α is the LTP decaying factor that was set to 10 in order to maintain stable behavior versus a fast LTP action decreasing; a faster LTP action (1000) as proposed in Refs. 19 and 48 affects the control stability of the system. (10) where LTP MAX and LTD MAX are the maximum long term potentiation/long terms depression values set to 1.0 · 10 −4 and −1.0 · 10 −3 , respectively (1.0 · 10 −5 and −1.0 · 10 −4 for joint velocity configuration); α is the LTP decaying factor that was set to 10 in order to maintain stable behavior versus a fast LTP action decreasing.
where MTP MAX and MTD MAX are the maximum modulating plasticity terms set to 1.0 · 10 −3 and −1.0 · 10 −6 , respectively, for both joint position and velocity configurations; α is the MTD decaying factor that was set to 10 in order to maintain a positive ratio between behavior versus fast MTD action decreasing. The CF j (t) activity in the IO-DCN plasticity represented the e fb(t) feedback joint position error instead of the normalized joint feedback error as used in Ref. 48. In order to both preserve the control loop stability and refer to the bio-inspired control system structure for voluntary movements, 50 the feedback error learning theory presented in Ref. 51 was took into consideration. In our work, the feedback error learning mechanism is obtained by using the Smith predictor strategy whose anticipatory actions contribute toward achieving a fast improvement of the performance and fast learning cerebellar consolidation. As a matter of fact, the cerebellar-like learning is facilitated by the supervised learning in terms of feedback error carried trough the CF in conjunction with the PF-PC synapses. 41 In addition to this, the prediction error obtained by comparing the internal forward model outcome with the 1950028-8 current robot plant outcome, indirectly contributes to the learning and adaptation of the ULM module through the generation of the optimal τ control input.

The modular robotic system
Fable is a modular robot consisting of detachable modules. 14 The modules can be active (with motors) or passive and can be assembled together to shape configurations that differ in morphology and topology. Each module has two actuated revolute joints. In this work, tests on a single active module with 2-DOFs and on two connected active modules (4-DOFs) lying in the configuration setup shown in Fig. 4(a) were conducted. Joints are actuated by Dynamixel AX-12A motors with an accuracy of 0.29 • . Modules receive a command wirelessly from the user computer that is serially connected to a dongle. The dongle provides a shared 2 Mbit radio communication link between the user controlled application and the modules. Modules are addressed using an ID and their module type. The web services (API) allow distributed control; a separate thread is executed on the computer for each module controller. The system application, which the user can run locally on the computer, has been developed using Python v.3.4. The user-program calls simple APIs functions on the remote modules trough the dongles connected to the computer, as shown in Fig. 4(b).
The control of the Fable robotic system has an inherent delay: each control-loop execution takes approximately 5 ms in a computer; once the torque is estimated, it is sent to the joint through the dongle which has a latency of about 1 ms. Besides this, there is an unknown variable delay related to the operative system, which is lower than 0.1 ms.

Experiments and Results
The outcome of the three control architectures shown in Fig. 1 has been compared and analyzed in this section. In all the following tests, the two-joint modules had to follow the figure-eight trajectory described by the following equations in angular coordinates: where A = 2000; the amplitudes of joints 1 and 2 are approximately 50.0 • and 25.0 • , respectively. Each completed figure-eight movement constitutes an iteration, which duration is 1 s (the sampling time is equal to 5 ms). The performances of the proposed architectures depicted in Fig. 1 are analyzed by executing four different tasks that allow proper assessment of the learning capability of the cerebellar model.
In each task, the performance is measured by computing the Mean Absolute Error (MAE) between the reference joint trajectory (Q r,j ) and the actual robot joint outcome (Q j ) along the whole trajectory. During all the trials, there is no offline learning, i.e the learning of the forward model starts when the simulation begins, so the cerebellum-like weights are activated; however, the Smith prediction as well as the cerebellar corrections are applied into the system after one initial iteration of the figure-eight. This provides the initial stability given by the feedback controller for the learning phase. The robot module batteries have to be fully charged in order to get the best performance.
1. Basic Task. Basic operation with one robot module. The robot module stands in a vertical position and the initial joint angles are set to zero. The figureeight movement is repeated 200 iterations with a fixed frequency and amplitude.
The first experiment simply compares the performance of the three architectures shown in Fig. 1. Results for the architecture proposed can be found in Fig. 5. It can be observed (black line) that the 1950028-9 adaptive and predictive control method is able to adapt to robot dynamics by making the averaged MAE decrease to 0.70 • at the last iteration.
The effectiveness of the control can be observed by comparing the result with the one obtained without the cerebellar-like outcome (brown dot dashed line). The outcome from the RPUs is actually what  Table 3. Testing the performance of 1-module basic task by varying the distance matrix D that is inversely proportional to the diameter of the RFs. The table data show the D matrix value, the number of RFs, the minimum MAE value for each joint and averaged among the joints, and finally, when the latter occurred during the whole executed simulation. The best result is highlighted in bold. leads to finer adjustments in the control. The task was also executed without the Smith predictor (red dashed line) to show that the contribution of the RPUs helps to anticipate the correction from the very beginning of the trial (see Fig. 5). The experiment was repeated eight times for the three control loops and the standard deviation (std) values corresponding to the minimum MAE values are indicated in Table 2. They are an indicator of how much, on average, measurements differ from each other. The RPUs corrections help to maintain the STD values low and the accuracy is higher than the case without them.
The number of RFs created by the LWPR was 11 for the position states (see Table 3). The distance matrix D k was set to 10.5 for all the experiments. The initial D k value, which describes the size of the newly created RFs, was chosen by analyzing the prediction error obtained, as shown in Table 3. Smaller values for D yield wider RFs, which oversmooth at the start of training and lead to slow convergence until they span the entire input space; it tends to reduce the generalization ability; bigger values for D yield smaller RFs, which might lead to fast convergence, but also to overfitting.

2.
Trajectory switching task. Trajectory switching with one robot module in the same initial configuration of task 1.
The second set of experiments focuses on the controller response to a set of sinusoidal inputs at different amplitudes and frequencies that are changed online every 30 iterations of the figure-eight movement. In the first test, the amplitude of the movement was changed every 30 iterations, with A ranging in the [2200, 2000, 1800] set for both joints, while the frequency remained constant. Results for this 1950028-10 Fig. 6. Test of amplitude change carried out with the model architecture in Fig. 1(c). The sequence was: A = [2200,2000,1800,2000,2200,2000,1800]. The switches occur after every 30 iterations. experiment are shown in Fig. 6, where it can be observed that the predictive architecture is robust to the changes during the executed action, since the error decreases faster after a switch especially in advanced learning phase. The number of RFs increased to 20 in this experiment.
In the second experiment, the amplitude of the movement remained constant, while the frequency was changed after every 30 iterations, ranging in the [0.9, 1.0, 1.2 Hz] set. Figure 7 shows the behavior obtained with one robot module. When a change occurred, the cerebellar-like adaptive Smith predictor performance is valuable in terms of averaged MAE, it decreased around 1 • . The number of RFs also increased to 20 during this experiment despite fast changes of the frequency. Thanks to the previous learning acquired by the RPUs during the changes, the system is able to provide fast adjustments and maintain a stable behavior that is especially perceived after 100 iterations. 3. Dynamic changes task. The robot module stands in the same initial configuration of task 2. In this task, the capabilities of the proposed architecture to react to changes in dynamics were tested by applying an external load on the robot end-effector. The load weighted 105.9 g and was applied by hand from the beginning of the trial in the experiment shown in Fig. 8, and after 40 iterations in the test shown in Fig. 9.
It can be noticed that the cerebellar-like adaptive Smith predictor does not seem to be influenced by the change in dynamics, and the cerebellum is  actually the one lowering the error soon after few iterations after the load application. Moreover, the initial MAE value is lower than the ones obtained without cerebellar contributions. As a matter of fact, the cerebellum is able to apply adaptive corrections thanks to the previous experience and learning, which is not allowed with only the Smith predictor. However, their combination helps refining the corrections to get a faster robust performance. 4. Basic task with two Fable modules. Basic operation and dynamic changes with two robot modules. The amplitude and frequency of the figure-eight movement are fixed and repeated by two separated robot modules. The dynamics of both robot modules change.
The final set of experiments focuses on the controller capability to deal with the dynamic disturbances caused by the connection of two modules and demonstrates its effectiveness for modular robots. In this task, two Fable modules were concatenated one on top of the other forming a column structure ( Fig. 4(a) and oriented top-down to overcome the payload engine limitations. Both joint modules had to follow the figure-eight trajectory given by equation 12. The results represented in Fig. 10 are given by the two modules attached together from the beginning, differently from Fig. 11 for which experiment the two modules were initially separated and only after 30 iterations were joined. In Figs. 10 and 11,  the averaged MAE among joints is around 2 • and 1 • at the last iteration for module 1 and 2 respectively, by using our proposed control method. In Fig. 10 the outcome of the three control architectures in Figs. 1(a), 1(b) and 1(c) was compared. Again, the combination Smith predictor and cerebellar feed-forward controller is the one making the difference in the control during the experiment compared to the other control strategies. The number of RFs created by the LWPR was equal to 20 for the experiment in Fig. 10 and 27 for the experiment in Fig. 11. Finally, the outcome in Fig. 11 highlights the high contribution of the cerebellum-like network with the Smith predictor strategy. Indeed, the cerebellarlike corrections enable a better robot performance with the Smith predictor from the beginning of the test and a fast reduction of the error after the two modules are concatenated.

Forward cerebellar model contribution to robustness and generalization capabilities
As anticipated, our findings support the hypothesis that the cerebellum could use the Smith predictor strategy based on forward model learning to provide fast and accurate corrections. The sensory signals were distributed in order to drive the learning and compensate the time delays inherent to the control system.

1950028-12
Test results, in Sec. 3, about the performance and validity of our controller in on-line motor control tasks revealed that: (1) the error decreases significantly and fast during the tasks; (2) the control is robust against changes in dynamics or in motion input. The errors are compensated throughout the generation of the internal forward model, prediction errors, and cerebellar corrections. The results show an improvement in the accuracy compared to a classic control approach and that the combination of the Smith predictor with a forward cerebellar model indeed contributes to deliver fine and effective corrections (Fig. 5). The proposed control architecture presents a lower final error than the one achieved without the Smith predictor and without cerebellarlike corrections during the control of the task. We argue that the angular error is due to limits of the hardware (gear set) and the motor's resolution. It is worth to mention that the system is friendly with respect to the parameters setup. As a matter of fact, all the experiments were run without any modification in the internal control parameters. Besides, an optimal feedback controller as well as an analytical robot model are not needed.

Implications for cerebellar role in anticipatory control
The results from experiments to changes in motion (Figs. 6 and 7) and in dynamics (Figs. 8, 9, 10 and 11) have confirmed that Smith control predictions help to deal with the sensory delay of the control loop and the modular RPUs adjust the control output to those changes. Indeed these are central features of cerebellar computation and function and they have been investigated in a recent work. 52 All the tests were performed on a physical modular robot in contrast with previous works based on the recurrent cerebellar architecture in simulated robotic tasks. 13,20 Luque et al. (2011) 20 evaluated the capability of a spiking cerebellar model embedded in a recurrent control loop to control a simulated robot arm. In their work, they were able to get as minimum MAE error approximately 4 • after 100 iterations of the same task. In contrast with their work, here both feedback-error-learning mechanism and predictive actions of the robot plant were used to enable a better improvement in the robot control performance. This confirms the role of cerebellum in driving the performance and at the same time getting benefits from the learned output. 8 In contrast with previous approaches, 24,53 there is no offline learning. This has two effects: from one side, it allows the system to adapt to changes in dynamics while performing the task, but it has the downside of having an initial learning phase in which the error is higher. However, it has been shown, with similar controllers, that this issue can be overcome by storing the learned forward model. 29 Anyhow, results show that active cerebellar corrections lower initial errors from the first trials of the experiment. In our model, at the same time, the forward model predicts the future state, the cerebellum updates its synaptic weights and computes its corrections based on the same sensory inputs of the former. This could explain how the sensory signals are distributed in order to drive the learning and anticipate corrections.

Concluding Remarks
In this work, a novel cerebellar-like adaptive Smith predictor was obtained by integrating a computational model of the cerebellum circuit embedded in a recurrent loop with a Smith predictor. The proposed control architecture is biologically inspired by the theory of the cerebellar internal forward models formation for anticipatory motor control. Other authors tried to provide anticipatory corrective responses through learning from motor errors, but they show a mechanism to correct feed-forward motor responses that is tied to the dynamics of the plant and so, they cannot be easily generalized to new robot configurations. These previous works are based on the combination of forward and inverse models without using cerebellar computational algorithms for robot control, such as Wolpert and Kawato, 54 who built a model for sensorimotor learning and control (MOSAIC), with the aim of learning and selecting the most appropriate set of internal models for a given environment. Indeed, a modular approach based on internal forward model pairing would lead to advantages since the controller would be able of providing appropriate motor commands for multiple contexts, tasks and experiences.
This work was also inspired by the modularity of the cerebellar network for implementing multiple microzones, one of each is devoted to control and learn the internal forward model of a single robot 1950028-13 module. Future work will regard the study about how the cerebellum processes the sensory inputs and organizes its internal structures for motor control tasks under changing morphologies and topologies. In this case, more complex nonlinarities could be handled by combining traditional PID approaches with biologically plausible methods that give the first ones the capability of continuously learning by adapting the gains. Some attempts have already been applied in control, such as membrane controllers for trajectory tracking and Spiking Neural P systems for communication strategy among neurons. 55,56 Finally, we argue that the performance could be improved by using advanced techniques for optimization of control parameters. 57,58