Cerebellar adaptive mechanisms explain the optimal control of saccadic eye movements

Cerebellar synaptic plasticity is vital for adaptability and fine tuning of goal-directed movements. The perceived sensory errors between desired and actual movement outcomes are commonly considered to induce plasticity in the cerebellar synapses, with an objective to improve desirability of the executed movements. In rapid goal-directed eye movements called saccades, the only available sensory feedback is the direction of reaching error information received only at end of the movement. Moreover, this sensory error dependent plasticity can only improve the accuracy of the movements, while ignoring other essential characteristics such as reaching in minimum-time. In this work we propose a rate based, cerebellum inspired adaptive filter model to address refinement of both accuracy and movement-time of saccades. We use optimal control approach in conjunction with information constraints posed by the cerebellum to derive bio-plausible supervised plasticity rules. We implement and validate this bio-inspired scheme on a humanoid robot. We found out that, separate plasticity mechanisms in the model cerebellum separately control accuracy and movement-time. These plasticity mechanisms ensure that optimal saccades are produced by just receiving the direction of end reaching error as an evaluative signal. Furthermore, the model emulates encoding in the cerebellum of movement kinematics as observed in biological experiments.


Introduction
Primates can carry out precise and fast movements, in the complete absence of sensory guidance (Iwamoto and Kaku 2010). The possible erroneous motions due to the absence of sensory feedback, are generally hypothesized to be corrected by the acquisition of predictive models in the cerebellum, regarding the state of the environment, and of the body itself (Xu-Wilson et al 2009, Stein 2009. However, the acquisition of these models should be supervised by temporally unspecific error estimates, available in a delayed processing stage. For example, in the case of ballistic eye movements called saccades, the total error information delivered to the cerebellum, is available only at the end of the eye movements (Hopp and Fuchs 2004). In addition, the received feedback from different movement directions is based only on the scalar sign of the movement error, without any sensory representation of the entire trajectory (Soetedjo and Fuchs 2006).
This poses a question: how can the cerebellar mechanisms achieve a supervised fine movement control in the absence of a temporally descriptive error information?
Biological systems and their robotic counterpart need to satisfy certain kinematic and dynamic constraints of movement desirability. For example, fast movements can be produced by instantaneous switching between maximum positive and negative control commands, to accelerate and decelerate a given robotic or biological system towards a given goal location. However, this kind of control strategy is not desirable in terms of energy expenditure (Riazi et al 2015) and accuracy in the presence of signaldependent noise (Harris and Wolpert 1998). These task-related constraints are generally incorporated in robotics and behavioral models (Ivaldi et al 2012, Scott 2012, by means of fitness or cost functions, that evaluate the quality of a trajectory in terms of an abstract representation of the movement goal.
However, in lieu of the aforementioned sensory information constraints, it is unclear how the neural substrate that underlies the fast movements, especially in the cerebellum, can implement this fitness evaluation. An elucidation of this problem can facilitate simplified implementations of fast reaching movements, with regard to efficient sensory requirements and supervised control strategies.

Background
A classic modeling strategy used for examining cerebellum is to consider it as an adaptive filter (Fujita 1982, Dean andPorrill 2011). In this perspective the cerebellum is considered to perform three major signal processing operations: 1. The expansion of input mossy fiber (MF) contextual information into high-dimensional components in the granular layer-composed of granule cells (GrC) and golgi interneurons (GoC); 2. The linear combination of the available granular layer signals transmitted through parallel fibers (PFs) to generate Purkinje cell (PC) output; 3. Adjustment of the strength of PF-PC connections by means of the teaching signal available as the climbing fiber (CF) activity. Several other connections regions of plasticity are also observed (Luque et al 2016), which are not the topic of discussion of this paper. The applicability of this adaptive filter modeling approach is verified vastly in tasks such as VOR/smooth pursuit (Dean and Porrill 2011, Franchi et al 2010, Vannucci et al 2015 and arm reaching movements (Spoelstra et al 2000, Tolu et al 2013. These tasks can be viewed in the supervised learning framework, where the task objective is to minimize the continuous error between the desired and executed movements in time. The task error itself is often available as continuous CF activity through-out the movement, albeit delayed. When this error minimization is formulated as a least mean square error reduction problem, the resulting local covariance rules corroborate the plasticity mechanisms of biological cerebellum (Fujita 1982, Porrill andDean 2008).
Particularly interesting case arises when the task objective and the available error information do not have such direct relationship (Harris 1998). This lack of a highly descriptive error information is evident in a class of fast eye movements called saccades.
Saccades are fast eye movements carried out by primates, to bring a given target into the center of the fovea. The striking feature of saccadic eye movements is the suppression of vision during the movement, eliminating any chance of visual information being employed for online feedback control. At the same time, proprioceptive information has been observed to have little significance during the saccadic eye movement (Lewis et al 2001). In (Harris andWolpert 1998, Harris 1998), the authors proposed that the highly stereotypical relationship between the eye movement amplitude, speed and duration are due to the computational objective or desirability employed by the central nervous system to minimize the variance associated with the eye movements. This variance itself is a result of inevitable signaldependent noise in motor commands, that is proportional to the magnitude of the command signal. Hence, when the eye is provided with high motor command for faster movement, the accuracy in reaching would be compromised due to high noise level. On the contrary, low magnitude motor commands lead to an increase in the reach time, although they improve movement accuracy. The resulting trajectories of eye movements were considered to be a balance these two opposing costs.
In most of the cerebellum specific neural models of saccades in both neuroscience and robotics (Dean et al 1994, Gad and Anastasio 2010, Quaia et al 1999, Antonelli et al 2015, the cerebellum is mainly simulated to account for the accuracy of the eye movement. However, lesions in cerebellum causes a loss in accuracy and speed as well as increase in the duration of the movement (Robinson et al 1993), pertaining to the loss in stereotypical optimal characteristics. Recent works in (Saeb et al 2011, Kalidindi et al 2018, simulate the optimal fast eye movement characteristics by considering the effect of cerebellum induced adaptation, but do not include the information constraints on the available sensory feedback such as the absence of continuous movement trajectory information.

Contribution
The main contribution of the paper is to show that by optimizing our custom fitness formulation for saccadic eye movements, we can derive plasticity rules that agree with the synaptic plasticity mechanisms in cerebellum. Mapping the movement optimality with the cerebellar plasticity mechanisms is non-trivial, if we take into account the fact, that the cerebellum is composed of large number of neurons and multiple synaptic plasticity mechanisms. Notably, the weight-update rules that emerge from the optimal control formulation, follow the local covariancebased synaptic plasticity rules in the cerebellar PF-PC synapses. The approach that we follow is to formulate saccade adaptation as an optimal control problem, while including reasonable constraints posed by the information flow in biological cerebellum. Such constraints are (i) the complete absence of sensory feedback during the movement and (ii) the occurrence of endpoint error that signifies only the direction in which the eye missed the target, without any detailed magnitude information. Hence the adaptive filter in our model learns to compensate for the missing sensory feedback, while also maintaining the movement optimality or vigor.
Our model indicates that the PC layer activities of the adaptive filter are correlated to the eye movement kinematics, similar to the neurophysiological evidences that the cerebellar PC populations encodes a prediction of the eye speed and displacement (Herzfeld et al 2015). This justifies the biological implication of the proposed model, by providing a working hypothesis on the potential plasticity mechanisms that are responsible for movement encoding in the PC populations.
Overall, we demonstrate the ability of the proposed model to enable fast and precise eye movements, in the absence of online-sensory processing and exact knowledge of the humanoid robot plant. We show that the lack of trajectory information during the movements do not hinder the eye movement adaptation over learning trials.

Computational model and methods
The overall saccade control system (shown in figure 1), is similar to the high-level control architecture presented in (Dean 1995). This control system is mainly composed of the brain-stem component and the cerebellum based adaptive filter. The adaptive filter uses the information regarding the target, y d , and an afferent copy of brain-stem activity, v, as its input. The brain-stem is simulated to reproduce the faulty eye movement characteristics, namely target overshoot with reduced peak speed and increased duration, in the case of cerebellar lesions. Sensory errors experienced during erroneous movements are utilized to improve the motor commands for subsequent movements, by updating the adaptive filter read-out weights (namely PF-PC connection weights w pf -pc ).

Saccade control inputs
Neuronal structures like superior colliculus that generates target movement command and relays error related information; and omnipause neurons that inhibit the eye movements are not explicitly included in this model. For the camera image processing and target detection (see figure 1), to substitute the capability of retina and superior colliculus in the target detection, we used a tracking model (Taiana et al 2010, Vannucci et al 2014 based on particle filtering methods. This exploits knowledge on the shape and color of the known object to be tracked (in our case a sphere). In this filter, each particle is a hypothetical state for the object, composed of its 3D position. Particles are weighted through a likelihood function. Color and luminance differences between the sides of the hypothetical object silhouette are indicators of the likelihood of the object pose.

Adaptive filter model
In our computational model, several simplifications have been made in the original cerebellar information processing to make this suitable for robot application. The design considerations behind the proposed simplifications are described in the supplementary material (https://stacks.iop.org/BB/16/ 016004/mmedia) in section S2.

Model
The adaptive filter model shown in figure 1 is mainly comprised of input information representing the sensorimotor states, referred to as MF inputs; Expansion layer, instead of a bio-realistic GrC layer, expands the input sensorimotor information into higher dimensions; PC layer to recombine the expanded signals, and produce the output through the nucleus (Nuc).
This model follows a rate-based formulation of the underlying neuronal population activity. In this, the neuronal components affect the movement output by their firing rates measured in Hz (where 1 Hz = 1 spike/sec). These neuronal firing rates are simulated to arise from leaky-integrator differential equations with a non-linear saturation, at fixed lower and upper bounds on the total input received by the neuronal components. i.e., the firing rate of each of the specified neuronal component is considered to be zero below a specified lower bound. This firing rate increases with increase in the total inputs to the neuronal component, and saturates at a fixed value at which the total input reaches an upper-bound. For the sake of brevity, the inputs to each of the neuronal component are scaled to let the component operate within the saturation limits.
The expansion layer, modeled as an echo-state network, is composed of N = 300 leaky-integrator units, represented as an N × 1 vector of neuronal activities z. These units receive the saccade sensorimotor context as MF firing rate information, and their neuronal firing rate is represented as: where, τ is the network time constant, z(t) is the vector of N expansion unit activities at time t. z o is the background activity of the expansion layer units. Further the lower limit of each component of z is set to be equal to 0. S(t) represents the combined excitatory inputs from the MF firing rate activity composed of mf d and mf u , and strictly inhibitory projections, −wz(t), within the expansion layer.
Where f is the non-linear function that bounds the input firing rate to the expansion layer units between fixed lower and upper limits. w in is an N × 1 vector drawn from standard uniform distribution between [−0.5, 0.5]. w is constructed by drawing N × N samples from uniform distribution of strictly non-negative numbers between [0, 0.5], and further dividing each sample by the spectral radius, ρ, of the resultant weight matrix. The resulting weight matrix can be denoted as w = U(0,0.5) ρ(w) . The ratio of number Difference between the current eye orientation and the target orientation from the image processing step drives the burst generator. The burst generator input is further regulated by the local feedback integrator. The burst generator output is used as velocity command to drive the robot eye. Model cerebellum is comprised of the input mossy fibers, Purkinje cell layer and the nucleus, and influences the burst generator through the nucleus. The gate determines whether or not to move the eyes when a target is presented on the camera.
of non-zero connections to the total possible connections is set to be 0.4 to ensure sufficient network sparsity. By normalizing the weight matrix with its spectral radius and by adjusting the sparsity of connections, we verified that the expansion layer activities wash-out any network initial conditions as indicated in (Lukoševičius 2012, Rössert et al 2015. The expansion layer activity (z) is transmitted by means of PFs connections (p) to the Purkinje cell layers (PCs). The PFs are considered to relay the expansion layer activity for simplicity, thus z = p. The modulation in the PC layer activity, δr pc , for a given PF activity is represented as follows: δw pf -pc represents the PF-PC weight vector of length N. δw pf -pc is used instead of w pf -pc to have a general formulation in which there can be nonzero background activity in the PCs, and non-zero synaptic strengths.
The total output from cerebellar adaptive filter, δr nuc , is available through the nucleus, and is a combination of excitatory MF connections and inhibitory PC connections.
In this paper, we consider w mf−nuc as a zero vector, implying no direct MF contribution on nucleus activity. Furthermore, setting w pc−nuc to be equal to 1 results in the following equation, PF-PC weights are considered to be the only plasticity connection strengths in this model. Appropriate configuration of w pf -pc should be estimated from experienced errors and adaptive filter outputs in a mechanistic way, i.e., from the movement feedback available from physical robot experiments.

PC control of brain-stem burst and motor drive
The PC output from cerebellum exerts an indirect control of the ongoing motor command. The PCs project onto the cerebellar nucleus, the nucleus modulates the activity of the brain-stem burst that subsequently modulates the motor command or drive delivered to the motor neurons. As shown in figure 1, the cerebellum based adaptive filter is represented as a grouped control block, that controls horizontal eye movements. The question that the model faces is whether this single block of adaptive filter activity contributes positively or negatively to the overall brain-stem input, that subsequently corresponds to the total motor drive. Recent experimental observations in (Herzfeld et al 2018) indicate that, the direction of action in the motor space of the endeffectors is parallel to the direction of preferred error in sensory space. i.e., when errors are experienced in a certain direction during movement to a specific target location, the motor commands in the future movements towards the same target are adjusted to increase the drive towards that particular error direction. This additional drive towards the error direction is achieved by the modulation in the corresponding PC activity.
A detailed rationale of these findings, applied to our control block simplification, is presented in the supplementary material in section S1.
Overall, the total contribution of the PC activity (δr pc ) to the net brain-stem burst input (b r in ) with respect to rightward horizontal movements can be written as: In terms of the nucleus output, when we apply the transformation from equation (5) for the ipsilateral nucleus activity, we get In addition to the adaptive filter contribution through the nucleus, the total input to the brainstem burst includes a signal representing target displacement information and a local feedback integrator. This local feedback integrator ensures that the total motor drive decreases as the movement time increases (Jürgens et al 1981, Scudder 1988. To emulate the effects of this feedback integrator, in our experiments we emulate the resettable local feedback loop presented in (Dean 1995, Kalidindi et al 2018. By including the effect of the feedback integrator and the target displacement command (presented in equation (8)), the net rightward input to the brainstem burst can be represented as: k determines the ability of the local feedback loop, represented by an imperfect integration of brain-stem output v, to account for the eye progress towards a given target location in the absence of cerebellum. The value of k is derived directly from models on primate eye movements, approximately equal to 0.7 (Dean 1995), and results in inaccurate eye movements as the estimation does not take oculomotor plant characteristics into account. In summary, an increase in the leftward preferring PC activity results in a decrease in the nucleus activity and the brain-stem burst input and vice versa.

Brain-stem burst
The control command to the iCub oculomotor system is delivered by the brain stem burst generator block (see figure 1). The shaping of brain stem output is determined by the net contributions b in from the desired displacement, re-settable local feedback integrator, and the adaptive filter. The computation in the brain-stem control block is emulated from (Dean 1995) to have exponential output response as follows: 'A' is the peak amplitude of the brain-stem burst. σ is the parameter that determines the non-linearity of the brain-stem response.

Learning mechanisms
The main function of adaptive filter model of cerebellum is to learn from the experienced sensorimotor errors with an objective to reduce them. Errors experienced during movements result in compensatory plasticity at different sites of the cerebellum adaptive filter (Gao et al 2012). For the current model, PF-PC synapses are considered to be the only zone of plasticity.
In an optimization perspective, the PF and CF activity dependent plasticity in previous models (Porrill and Dean 2007, Porrill and Dean 2008, Fujita 1982 has been derived as a gradient-descent based solution to the minimization of mean-squared movement error, as described briefly in equation (10).
where δe (t) is the movement error at time 't' from the start of the movement, while λ N j=1 δw 2 pf−pc j penalizes high synaptic parameters. Optimal synaptic weight modifications (δw pf−pc ) that minimize this mean-square error cost were derived as gradient based incremental plasticity rules, as briefly represented in equations (11) and (12), Where 'n' represents the incremental number of weight update step, η is the learning rate, and ∇J(δw (n) pf−pc ) = ∂J/∂w pf−pc j represents the gradient direction of the movement cost 'J' in the adjustable synaptic parameter space of δw pf−pc j . The cost gradient is related to the co-occurrence of PF activity and continuous movement error transmitted as CF activity, and results in biologically plausible local weight update rules (Porrill and Dean 2007).
However, this mean-squared movement error based cost function does not explain two crucial properties of saccade movement adaptation Fuchs 2006, Soetedjo et al 2009). First, if the end movement error is the only quantitative information available, then how does it result in trajectory control regarding the improvement in movement speed and accuracy. Second, the movement adaptation is dependent on solely the direction of error rather than a precise magnitude of error.
To examine this puzzle, we formulated a movement cost that penalizes low changes in the adaptive filter output activity throughout the eye movement duration t, sensory error only at the end of the saccadic eye movement at t = T end (as dictated by biology), and high synaptic weights as presented below: Where |x| represents the absolute value of a given variable x. α, β, λ are positive penalty coefficients. The first cost term, J prim , penalizes low adaptive filter PC output during the primary eye movement duration t = 0 to t = T end . Since the PC output is applied to control the movement commands, and this term is available throughout the movement duration, J prim can be an applied to modulate the motor commands and subsequently the movement kinematics throughout the movement duration. The error related term, J end , is composed of the visual foveation error observed only at time step t = t error after the completion of movement at time t = T end , instead of entire trajectory of error. Note that t error T end , which means the error information is available only after the movement ends.
Additionally, even this end foveation error is available for brief duration in the form of complex spike activity caused by low probability CF event from t error to t error + , where is the width of the error pulse (see figure 2).
Taking the temporal nature of the cost terms into account, the cumulative costs incurred by the two terms J prim and J end across the duration of saccade, t, and target stimuli, s, can be described as: For simplicity in the mathematical expressions, we omit the summation over stimuli in the rest of the paper. It should be noted that the weight update rule additionally includes this across stimulus summation in the implementation.
By applying gradient descent based incremental updates to minimize the cost J as depicted in equations (11) and (12), we derived incremental learning rules that ensure continuous improvement in movement speed as well as accuracy. Importantly, the use of absolute value of the foveation error, denoted by |δe| in equation (15) instead of a commonly used least-mean-square error results in a learning rule that depends only on the direction of error (as depicted in the mathematical derivation in appendix A, which is available in the supplementary material). Overall, the incremental weight update has been derived to be, Where η corresponds to the learning rate. Additionally the coefficient ψ, referred to as eligibility constant, temporally aligns the PF signal with the CF signal that contains delayed error information. sgn(x) = |x| x is the sign of a given input quantity x. The first term in the brackets of equation (16) is the weight update that depends upon the PF activity (p j ) in the absence of any error information δe. The sgn(δr (t) pc ) term ensures that the weights corresponding to both the rise and fall in PC activity with respect to the background activity are increased. Note that the PF activity is integrated in the duration of the primary eye movement from t = 0 to t = T end , and can modify the modify the motor commands throughout the movement duration. The second term in the bracket is the most common error (δe) and PF activity (p j ) based covariance update (Sejnowski 1977, Fujita 1982, Dean et al 2002, that is responsible for sensory error reduction. The only difference is in the integration time from t = t error to t = t error + , where t error represents the onset of error dependent CF activity and is the pulse width as depicted in figure 2. Other difference is the dependency of the weight update only on the sign of the error. Provided the cost is written in terms of the square of error, the term would have contained the magnitude of error for weight update. All the parameters relevant to the presented model are presented in table 1.

Details of the experiment 2.6.1. Humanoid oculomotor configuration
For validation of the proposed adaptive filter learning principles for optimizing fast-reaching movements, we have implemented the saccade adaptation task on the iCub humanoid robot (Beira et al 2006). The robot has three degrees of freedom for eye movements, a common tilt on the vertical axis, and separate pans on the horizontal axis. The APIs allow to control the two pans only in terms of vergence and version. The oculomotor control loop runs at 100 Hz. The maximum speeds of the eye joints on the horizontal axis are 180 deg s −1 , which is lesser than the biological oculomotor system which is approximately around 1000 deg s −1 (Fuchs 1967) for monkeys, and 700 deg s −1 for humans (Boghen et al 1974). However, this peak speed discrepancy only results in trajectories with longer durations in robot experiments, without altering their shape. The experiments were performed in horizontal direction, using eye version, with iCub operating in velocity control mode.

Movement adaptation paradigms
Adaptive filter weights, w pf -pc , must be adjusted in response to sub-optimal movements (following equation (13)). These sub-optimal movements themselves can be caused due to the insufficiency of brainstem control for a given oculomotor system, or due to abrupt jumps in visual target during the eye move- Figure 2. iCub target jump experiment. The iCub, shown in the left panel, is required to move its eye from initial focus location (represented as black lines and black circle) to a target location (represented as red lines and red circle), with y d as desired eye displacement. As the movement initiates, the target is shifted to arbitrary new location (shown as blurred red circle) resulting in a foveation error e even for appropriate eye displacement y d , as depicted in the middle panel of the figure. The foveation error e caused by this intra-saccadic target jump is observed only at the end of the eye movement, while no sensory information is available regarding the whole eye movement trajectory. Inset picture (the right panel) shows the tentative nature of the foveation error signal used to drive the iCub saccade adaptation. ment. We tested adaptive filter capabilities in both of these conditions. From the scratch adaptation paradigm (FSA). In this paradigm, the adaptive filter begins with zero PF-PC weights (δw pf -pc ), resulting in no adaptive filter contribution to the motor commands during the beginning of the adaptation procedure. Hence, before adaptation trials, the eyes move erroneously to a given random target (sampled between 2 deg to 30 deg displacement) determined solely by the burst generator characteristics. The adaptive filter weights are adjusted according to the update rules described in equations (13)-(16) in a batch mode with fixed sample size after a set of eye movements to randomly sampled target stimuli in the visual field. The complete set of eye movements towards randomly sampled stimuli in the fixed batch are referred to as a single rollout. Batch updates refer to a procedure where weight updates are performed at the end of each roll-out and remain constant for movements within a single rollout. These learning roll-outs are repeated until the cumulative cost from equation (13) on the training set has a stable convergence. The efficacy of learning is characterized by movements on randomly generated, fixed test target locations.
Target jump adaptation paradigm. In this paradigm, the adaptive filter learning compensates for sensory errors caused due to abrupt target jumps during the eye movements, as depicted in figure 2. We followed a protocol similar to that of primate behavioral experiments (McLaughlin 1967) followed in conducting saccade experiments on humans/primates. First, the robot is commanded to move towards a randomly generated target location. During the movement the target is displaced to a new location, resulting in a foveation error that is available only for a 100 milliseconds after the end of the eye movement.  (Robinson et al 1993). (a) Eye displacement to a given target location (b) corresponding eye speed (c) characteristic total movement cost, J, with respect to increasing learning trials/roll-outs. Yellow vertical lines represent the sufficiently long, but, fixed time horizon for minimizing the total cost J, from t = 0 to t = T end .

Adaptive filter learning leads to improvement in movement speed and accuracy
In this section we evaluate the capability of the proposed learning rules to counter inaccurate and slower eye movements produced by the brain-stem burst controller. The training procedure follows the FSA paradigm described in the methods. The differences between the eye movement characteristics of the iCub robot before and after the adaptation, are illustrated in figure 3. Figure 3(a) depicts the pre and post adaptation eye displacement attained for an exemplary test target location, while figure 3(b) presents the speed modulation effected by the same adaptation. The total cost (J, equation (13)) of the saccadic eye movement has been plotted as a function of the number of adaptation trials or roll outs in figure 3(c). As it can be observed, during the adaptation, the eye speeds increased in peak value, with a simultaneous reduction in the eye movement duration. The trained model did not take an alternative strategy to reach with unchanged peak-speed, by just modulating the duration of the movement. Notably, the behavioral strategy followed by our model confers with biological observations. The differences between the pre and post adaptation trials are qualitatively comparable to that of the monkey eye movements with bilateral deactivation of the deep cerebellar nucleus activity, and with an intact nucleus [see figure 12 in (Robinson et al 1993)]. Naturally, as the maximum achievable speeds in the robot and the humans/monkeys differ, we will not be able to observe exact quantitative similarities in the kinematics.
This result highlights the applicability of the proposed weight update rules to modulate the accuracy as well as the speed and duration of the eye movement, even in the presence of only the end foveation error information.

Robustness of the method to increase in the movement dimensions and changes in the target representation
Given that our adaptive control method was able to achieve low movement costs in the previous case, we tested its general applicability by changing the targets specification from joint angle representation to the pixel-based representation in the iCub camera. Moreover, the robot was free to move its eyes in horizontal (X-axis) and vertical directions (Y-axis) by using its pan and tilt rotations. Even in this case the learning signal consists of the direction of the reaching error in the X and Y directions relative to the centroid of the camera image, rather than the exact magnitude of reaching error. Hence, we have a signed directional error-vector in the X and Y camera coordinates. Network was trained on random pixel-based targets that appeared on the iCub camera frame.
After training the controller on a set of pixel locations using the FSA paradigm, resulting in total cost in auxiliary units shown in figure 4(b), we present the performance of the system by randomly selecting 10 locations as test set in figure 4. Figure 4(a) shows the eye displacement (shown as red line) to the test targets (shown as blue dots). Figure 4(b) shows the reduction in the cumulative reaching error post adaptation. The reaching error post-adaptation has median 1.1 deg (inter-quartile range of 0.51 deg ), which is clearly reduced compared to the median 7.8 deg (inter-quartile range of 3.9 deg) in the pre-adaptation trials (see figure 4(c)).

Separate adaptive filter plasticity mechanisms exert distinct control on the eye movement optimality
The w pf -pc weight update involves multiple terms that are separately active during the eye movement t ∈ (0, T end ), and at the end of the eye movement t > T end . Through this section, we illustrate why is each term important in maintaining a specific aspect of the movement optimality. Importantly, we present how the resultant weight update rules are similar to the local synaptic plasticity rules reported in the cerebellum.
In the derivation of the plasticity rules, we assumed that the penalty (J prim ) results in solely PF activity dependent increase in PF-PC weights during the eye movement, independent from the error information. J end on the other hand accounts for the errordependent CF and PF covariance weight update. Figures 5(a) and (b) depict the characteristic eye movement kinematics after adaptation in two different cases using the FSA paradigm. In one case, the total desirability of the saccades is comprised of both J prim and J end terms. In the second case, the only desirability of eye movements is to reach the target accurately by considering only J end , without any further constraints on optimality. Although both kinds of adaptation lead to accuracy in reaching a given target, significant differences can be observed in their resultant trajectories. The adaptation regulated by the combined J prim + J end factors leads to increased speed and reduced duration in reaching the target [presented as blue colored movement profiles in figures 5(a) and (b)]. J end -only regulated adaptation achieves accuracy by decreasing the peak eye speed and letting the eye movement follow time durations that are close to pre-adaptation trials (presented as red colored movement profiles). This indicates that the error dependent J end penalty accounts only for the precision in reaching a given target displacement. Hence, using only end foveation error as the movement fitness would result in the eye not reaching the target as fast as possible. On the other hand, adding a penalty on low PC activities (imposed by J prim ) can modulate eye speed and duration by considering only the PF activity into account, Figure 5. Proposed fitness J, regulates the optimal movement characteristics, while also explaining the functional significance of local plasticity rules in the Cerebellum. (a) and (b) show the different trajectories obtained with purely error dependent fitness formulation (only J end , shown in red color) in comparison with the inclusion of additional penalty for output PC activities (J prim , shown in blue color). (c) Shows the behaviour of the learning in terms of the total cost decrease, flight time decrease and foveation error decrease. Finally, (d) and (e) compare the synaptic weights as a result of learning by including different cost terms, showing that in the J end only case, a depression mechanism that depends upon the foveation error is prominent. without necessity for any sensory feedback during the movement.
A depiction of the cost behaviors in each case is presented in figure 5(c). The total cost starts at the same value for both cases due to initialization of PF-PC synapses at '0' values. However, J end -only regulated adaptation saturates at a higher value compared to the J prim + J end case (left panel of figure 5(c)). Main differences can be observed in the plots of flighttime and foveation error. The inclusion of J prim in the adaptation trials results in flight-time reduction over increasing number of learning trials. On the other hand, J end -only modulated adaptation does not have any significant effect on the flight time [middle panel of figure 5(c)]. Furthermore, the foveation error plots show that the J end cost leads to quick improvement of movement accuracy (with foveation error close to zero). In contrast, the composite cost function converges slowly in terms of movement accuracy, due to the opposing effect of eye speeding on the movement accuracy caused by the J prim term (right panel of figure 5(c)). This results in a pronounced saw shaped waveform when the reaching error is reduced to 0 deg, due to the relatively nullified contribution from J end term compared to the opposing J prim term. This saw-shape waveform in the reaching error can be controlled by adjusting the α and β coefficients that determine the relative penalties on the errorindependent and error-dependent costs, respectively.
The changes in the PF-PC synaptic strengths that are responsible for modulation of eye movement kinematics, due to their effect on the cerebellar output r nuc , are presented in figures 5(d) and (e).
Adaptation to the composite cost is enforced by increase in the PF-PC synaptic strengths w pf -pc  (d) show the proportionality between peak PC population activity and amplitude and speed of the robot movement, an observed phenomenon in biological saccadic movements. However, there are quantitative differences in the amount of response and duration of the activity due to differences in robot operation. Yellow vertical lines represent the sufficiently long, but, fixed time horizon for minimizing the total cost J, from t = 0 to t = T end .
in positive and negative regions as depicted in figure 5(d). In contrast, only J end adaptation does not cause a significant w pf -pc increase in the positive weight space, but a pronounced reduction in the negative weight space. It should be noted that the positive and negative weight are considered for simplicity in the adaptive filter model. In biological systems however, the positive PF-PC strength arises due to excitatory synaptic connections onto the PCs, and negative PF-PC strength arises due to the inhibitory connections of the molecular interneurons onto the PCs (Jörntell et al 2010).

Adaptive filter displays similarity to cerebellum recordings
Further we asked, whether the model adaptive-filter can predict any of the brain recordings from cerebellum during saccadic eye movements?
Recently it was observed that the PC populations in the biological cerebellum display a definite prediction of the saccadic eye movement kinematics in (Herzfeld et al 2015). Inorder to emulate these biological recordings, in this section, we take into account the preparatory activity displayed by the MFs and PCs before movement onset (t < 0 s), that indicates motor planning. The MF activity regarding the target orientation is simulated to gradually build-up to the tonic level (similar to MF activity in (Gad and Anastasio 2010)) at movement initiation instead of a sudden build-up at t = 0 s. This provides the basis for motor preparation in the downstream adaptive filter nodes in the granule layer and the PC layer. Figure 6(a) illustrates the resulting adaptive filter PC population activity at various saccade speeds. A definite correspondence can be observed between the plots of different eye speeds and respective PC population activities. Figure 6(b) presents similar PC population activity plots at different saccade amplitudes. It is important to note that, the negative PC population activity in figures 6(a) and (b) represents the drop in PC activity from spontaneous/background firing rate activity, and does not suggest the existence of negative neuronal spike frequency in Hz (or spikes/s). The main observation from the PC layer is the near linear increase in the peak PC activity for both increasing eye speeds and eye displacements, shown in figures 6(c) and (d), respectively. This is inline with the neurophysiological observations in (Herzfeld et al 2015) regarding the correlation of cerebellar PC population activity with saccade kinematics.
However, the adaptive filter PC activity differs from that of the biological observations in two aspects: 1. The adaptive filter PC layer is active in the durations spanning 1-1.5 s compared to the biological PC populations that span 200-300 ms. This is because the biological eye movement typically ends in 100-150 ms duration, while the robot movement lasts for long time due to the limits in the peak speed. The relatively longer duration of the saccade command and the corresponding MF inputs to the adaptive filter actively sustain the PC layer activity for appropriate control of the robot eye movement.
2. The quantities of peak adaptive filter PC activities are in the range of 50-200 Hz, while the PC population activities in the biological experiments (Herzfeld et al 2015) are in the range of 1000-1500 Hz. This is due to the different neuronal activity scaling in biological motor control and the presented robot control.

Adaptation to target jump
Target jump adaptation experiments were carried out as described in the methods section. Our aim is to see if our model can predict the changes in average PC population activity and synaptic updates, that might occur during the de-facto standard target jump experiments in monkeys. In figure 7, we present characteristic adaptation result on a target that is initialized at approximately 20 deg, and exhibits a jump to 16 deg during the eye movement, thus resulting in a foveation error close to 4 deg when the adaptive filter configuration that is previously trained under the FSA paradigm to compensate for inaccurate brain-stem control is used. Figure 7(a) depicts the continuous reduction in the saccade amplitude reached by the iCub eye with increasing number of roll outs. In figure 6, it can be seen that the PC population activity display an early rise from the spontaneous/background activity (referred to as burst in PC activity), and a late dip from background level (referred to as pause in PC activity) approximately at the time when the eye reaches peak speed. We calculated the difference in the average burst and pause activities in the model PCs before and after saccade adaptation trials, and plotted the results in figures 7(b) and (c). Average PC population burst undergoes decrease with relatively lesser slope (reduction of 3 Hz in 50 roll-outs) than the decrease in average PC population pause (reduction of 10 Hz in 50 roll-outs). These results are qualitatively similar to the modulation in simple spike responses of PC populations over increasing number of adaptation trials in monkeys (see figures 6 and 7 in (Kojima et al 2010)). Figure 7(d) depicts the bi-directional changes in PF-PC connection strengths.

Discussion
We have presented a model of cerebellum based on adaptive filter operation for accurate control of fast reaching movements, in the presence of restricted sensory information. The only sensory information required to enforce adaptive corrections in control was the sign of end foveation error, available at the completion of the movement.
Furthermore, even in the presence of the mentioned sensory constraints, we proposed that the adaptive filter actively influences the entire move-ment trajectory, to improve the cumulative fitness with respect to the flight-time (movements with high vigor), and accuracy. We have presented a mathematical derivation of the local PF-PC weight updates, from behavioral level specification of the task.
The model was successfully implemented on saccadic eye movements of the iCub humanoid robot, with a continuous improvement in trajectory fitness over a period of adaptation trials. The method was applicable even in the camera pixel coordinates rather than the joint coordinates, and is of potential use to robot gaze control. The contributions of adaptive filter PC activity is in qualitative agreement with the neurophysiological observations in the monkey cerebellum (Herzfeld et al 2015, Kojima et al 2010. However, there are explainable quantitative differences of the model results with biological system. Majorly, as the robot moves at low speeds compared to the biological eye, the MF and brain-stem signals in the robot experiments are prolonged compared to the biological system. This results in certain accountable differences in the amplitude and duration of the adaptive filter PC layer compared to the biological PCs.

Simplification of adaptive control by cerebellum-like processing
One of the intention behind this cerebellum model was to pave the way towards less computationally intensive algorithms like local covariance based learning in artificial systems, that the cerebellum is usually known to carry-out by using the perceived sensory errors as supervisory instructive signals for task execution (Marr 1969, Porrill andDean 2007). In previous studies (Harris 1998), it was shown that the visual error information available to biological cerebellum cannot explain the minimum-time characteristics of fast eye movements. This led to a hypothesis of possible reinforcement learning based exploration in adaptive parameter space of the cerebellum (Harris 1998). Reinforcement learning is a computationally expensive strategy due to the exploration in large number of synaptic connection parameters. The other option to tackle the trajectory correction was to have a priori knowledge, or a model of the plant itself (Chen- Harris et al 2008, Saeb et al 2011. In contrast, we deduced from saccade behavioral studies that, instead of increasing the complexity of the model or the computational algorithm, the cerebellum could be simplifying the trajectory-fitness (or) cost function itself, to accommodate the sensory constraints. Local covariance based plasticity mechanisms in our study ensured that the movements are accurate as well as optimal, even if the error information is highly constrained. In another related study, we empirically demonstrated a similar approach on the kinematic control of a high-degree-of-freedom soft-robot simulation for online adaptive control, without the need for computationally expensive adaptation algorithms (which demand a separate offline learning/exploration phase) (Kalidindi et al 2019).

Biological relevance of the derived plasticity rules
The adaptive filter plasticity rules have been purely derived from specification of behavior level objectives of the saccadic eye movements, by accounting for several, but not exhaustive information constraints faced by the biological cerebellum. Hence, it is worthwhile to compare the model characteristics to that of realistic bottom-up spiking neural network models of the cerebellum , Antonietti et al 2016. Realistic cerebellar models comprise several neuronal types with different regions of plasticity as presented in . In our model, we focused on the PF-PC synaptic plasticity and functionally divided this into two terms as presented in equation (16): (i) error-independent term that minimizes the eye movement cost J prim and (ii) error-dependent term that minimizes the movement error related cost J end available after the end of the movement. In more biorealistic models (Casali et al 2019), even the synaptic projections to the PC layer are divided into direct PF-PC connections that are excitatory in nature, and indirect PF-MLI-PC connections that have an inhibitory effect on the PCs. Further both these direct and indirect synaptic pathways to the PC layer can display plasticity Dean 2008, Jörntell et al 2010).
In our experiments, the error-independent term resulted in an early increase in the PC population activity as depicted by the net increase in positive (or excitatory) PC synaptic strengths in figure 5(d), and resulted in faster movements. This effect of increase in PC population activity can be biologically achieved by means of error-free long term potentiation (LTP) of the direct excitatory PF-PC synaptic connections or error-free long term depression (LTD) of the PCs by means of the indirect PF-MLI inhibitory connections. On the other hand, error-dependent plasticity term determined the late reduction in the overall PC population activity below the spontaneous level in order to decelerate the eye, and directly affected the eye movement accuracy. This can be associated to an LTD mechanism driven by the movement error (via CF activity) in the direct PF-PC synaptic pathway, or to an LTP mechanism through the indirect PF-MLI connections onto the PCs. Importantly, it should be noted that the error-independent plasticity in the model is related to the PF-PC activity during the movement, in contrast to the error-dependent plasticity that occurs at the end of the movement. Considering these common features, it would be beneficial to combine the insights from our adaptive control model with more biorealistic bottom-up models (Carrillo et al 2008, Antonietti et al 2016 inorder to bridge between behavior-level