Optimal Task-Dependent Changes of Bimanual Feedback Control and Adaptation

Summary The control and adaptation of bimanual movements is often considered to be a function of a fixed set of mechanisms [1, 2]. Here, I show that both feedback control and adaptation change optimally with task goals. Participants reached with two hands to two separate spatial targets (two-cursor condition) or used the same bimanual movements to move a cursor presented at the spatial average location of the two hands to a single target (one-cursor condition). A force field was randomly applied to one of the hands. In the two-cursor condition, online corrections occurred only on the perturbed hand, whereas the other movement was controlled independently. In the one-cursor condition, online correction could be detected on both hands as early as 190 ms after the start. These changes can be shown to be optimal in respect to a simple task-dependent cost function [3]. Adaptation, the influence of a perturbation onto the next movement, also depended on task goals. In the two-cursor condition, only the perturbed hand adapted to a force perturbation [2], whereas in the one-cursor condition, both hands adapted. These findings demonstrate that the central nervous system changes bimanual feedback control and adaptation optimally according to the current task requirements.


System Model
The model and its parameters are to a large degree derived from previous work on the two-degree-of-freedom arm [S1]. Because of the interactions between the elbow and shoulder joint, this system is nonlinear. Given the short length of the reaching movement (10 cm), I used a local linear approximation of the system in Euclidian coordinates. However, very similar results can be obtained with a nonlinear, joint-based model, by using iterative techniques to derive the optimal control policy [S2]. In the model, a state vector that contains 12 variables conceptualizes each hand. The state vector is composed of the x and y Euclidian coordinates of position (p), velocity (v), and force (f), muscle activation (h), and target position (g). The kinematics are modeled with p t + 1 = p t + Dtv t v t + 1 = v t + Dt ft I ; (S1) in which Dtis the size of the discrete time step, t is the number of time steps, and I is the inertia of the arm at the endpoint (0.5 kg). I simulated the delay in the muscles by using twocoupled first-order low-pass filters, with time constants t 1 = t 2 = 40 ms [S3], that related the force (f) to the motor command (u). Thus, the produced force is a low-passfiltered version of the motor command u. Again, the ''muscle activations'' are conceptualized in Eucledian coordinates because I am using a local linear approximation to a nonlinear system: To simulate sensory delay, I expanded the state space with a series of four-coupled first-order filters for the sensed position, velocity and force: t ; with j = 1; .; 4: The current state of the system is x ð1Þ , and the state that can be sensed is x ð4Þ , in which x is used as a placeholder for p, v, or f. The time constant for each filter was t s = 15 ms.
In summary, the state vector for each hand is And the sensed state is By defining the appropriate matrices, the whole timediscrete system can be written as follows: It is important to note that A, B, and H are block diagonal; i.e., the left hand has no influence on the right hand and vice versa. The stiffness of the perturbed arm was the only system parameter that was fitted to the experimental data. I used the data from the two-cursor condition and minimized the squared difference between the observed and simulated perturbation. The estimated stiffness term was 129.9 N/m.
The system matrices can be generated by calling M = bimoc_system('system').

The Cost Functions
The cost function for the two-cursor condition is composed of three terms: The third term is the time-independent control cost. The first term reflects the cost of not being at the target location, and the second term reflects the cost of not having zero velocity. These costs increase exponentially over the course of the movement up to the maximal movement time (900 ms), reflecting the requirement of being at the target with zero velocity in the end of the movement: To derive the relative weights of the costs and the time constant of the exponential rise, I estimated these parameters to fit the velocity profile of unperturbed movements, resulting in t cost = 51 ms, c p = 40 m -2 , c v = 5 s 2 /m 2 , and w u = 5*10e-5/s. For the one-cursor task, I used the same cost function, but replaced the position term with a common spatial term (Equation 2). The time-dependent cost function can be written in the form of and these matrices are generated for the one-and twocursor conditions by J = bimoc_system('costfunction', M, 'cursor',number_of_cursors).

Derivation of the Optimal Control Policy
With the cost function, the Riccati Equations [S4] are used to calculated the optimal cost-to-go matrix V t and the optimal control gains L t . This computation is performed from the end to the start of the movement, iteratively moving backward in time: The control gains allow us to compute the motor commands, given an estimate of the current state, u t = 2 L t b x t , that minimize the cost function of the onecursor and two-cursor condition, respectively. The main prediction of optimal control theory, independent Experimental data from experiment 1, plotted in the same format as in Figure S1. Results are averaged across hands and participant and displayed such that the perturbed hand appears on the left. As the arm stiffness of the perturbed hand increases, the within-hand correction rate increases and the between-hand correction decreases.

Figure S3. Sensitivity Analysis of the Correction Rate in Respect to the Parameter Arm Stiffness
Sensitivity analysis of the correction rate in respect to the parameter arm stiffness (N/m). As the arm stiffness of the perturbed hand increases, the within-hand correction rate increases and the between-hand correction decreases.

S2
control for the two-cursor condition, and dependent control in the one-cursor condition arises from the fact that for the two-cursor cost function, all L t are blockdiagonal matrices, and for the one-cursor task, all L t have nonzero off-diagonal entries. The derivation of the optimal control policy for a given system M and a cost function J is obtained by V = oc_solveLQG(M,J).
Force-Field Simulations I then simulated the reaction of the system under the optimal control policy while a force field like in experiment 1 was applied to the arm. Each time step of the simulations consisted of the following: 1. Computation of the sensory inflow based on current state, y t = Hx t . 2. Measurement update of the state estimate b x t given the sensory data y t with the optimal Kalman gains. The standard deviation of the measurement noise was 0.02 m for position, 0.2m/s for velocity, and 1 N for force, relative to the state noise with a standard deviation 1 [S3]. 3. Calculation of the current control signal, with u t = 2 L t b x t . 4. Simulation of the next time step of the real system with the system equations ( Equation S6). For the first simulation, I used the system as described above. For another, I applied a leftward-pointing force field to the left hand, and for a last simulation, a rightward-pointing force field. 5. Time update of the Kalman filter to update the state estimate assuming that there is no force field applied to the hand. This step is the forward model prediction of the controller.
These steps are implemented in the function x = oc_runsystem(M,V,J);.
No noise was added to the motor output, and sensory noise was assumed to be of the same size as used in [S3]. Figure S1 shows the prediction of the model under the one-and two-object conditions. Figure S2 shows the mean experimental data in the same format, averaged over participants and hands. These figures can be regenerated with bimoc_simulate('simulate_plot') and bimoc_simulate('data_plot');

Sensitivity Analysis of Arm Stiffness
Most model parameters could be either derived from previous modeling work or fitted from the unperturbed movement. Only the stiffness of the perturbed arm, i.e., the mechanical resistance of the arm to the force field, had to be estimated by the adjustment of this parameter so that the squared differences between predicted and observed x velocity of the perturbed hand could be minimized ( Figures S1C and S2C). This resulted in an estimate of 129.9 N/m. The sensitivity of the predicted within-and between-hand correction rate depending on the value of the arm stiffness is displayed in Figure S3. As the arm stiffness of the perturbed hand increases, the within-hand correction rate increases and the between-hand correction decreases because more of the correction is performed by the mechanical resistance of the arm against perturbations. Thus, the exact sharing of correction between the arms depends on the stiffness parameter, which for our setup could not be derived with high confidence. Despite this limitation, the qualitative prediction of there being no sharing of the correction in the two-cursor condition and shared corrections in the one-cursor condition (in the correct order of magnitude) is stable across a wide range of parameters and model specifications.