Incomplete information about the partner affects the development of collaborative strategies in joint action

Physical interaction with a partner plays an essential role in our life experience and is the basis of many daily activities. When two physically coupled humans have different and partly conflicting goals, they face the challenge of negotiating some type of collaboration. This requires that both participants understand their partner’s state and current actions. But, how would the collaboration be affected if information about their partner were unreliable or incomplete? We designed an experiment in which two players (a dyad) are mechanically connected through a virtual spring, but cannot see each other. They were instructed to perform reaching movements with the same start and end position, but through different via-points. In different groups of dyads we varied the amount of information provided to each player about his/her partner: haptic only (the interaction force perceived through the virtual spring), visuo-haptic (the interaction force is also displayed on the screen), and partner visible (in addition to interaction force, partner position is continuously displayed on the screen). We found that incomplete information about the partner affects not only the speed at which collaboration is achieved (less information, slower learning), but also the actual collaboration strategy. In particular, incomplete or unreliable information leads to an interaction strategy characterized by alternating leader-follower roles. Conversely, more reliable information leads to more synchronous behaviors, in which no specific roles can be identified. Simulations based on a combination of game theory and Bayesian estimation suggested that synchronous behaviors correspond to optimal interaction (Nash equilibrium). Roles emerge as sub-optimal forms of interaction, which minimize the need to account for the partner. These findings suggest that collaborative strategies in joint action are shaped by the trade-off between the task requirements and the uncertainty of the information available about the partner.


Model summary
We modelled plant behavior and task as a differential non-cooperative game with Gaussian noise and quadratic costs. We also assumed that each player has a state observer which also predicts the partner's actions. Model formulation is described in the main text and is summarized by the following equations: Plant dynamics: x(t + 1) = A · x(t) + B 1 · [u 1 (t) + η 1 (t)] + B 2 · [u 2 (t) + η 2 (t)] and, for each player (i = 1, 2): Optimal controller(s): State observer(s):

Model implementation
To study how joint coordination is influenced by uncertainty about the goals and actions of their partner, we applied the general computational framework, described in the main paper, to a sensorimotor version of classic battle of sexes game. Partners were mechanically connected through a compliant virtual spring and they have partly conflicting goals -reaching the same target through different via-points.

Dyad dynamics
In our simulated dyad movements, we approximated each player's upper limb and robot dynamics as a point mass m i , i = 1, 2: where p i (t) and p −i (t) are the hand position vectors of, respectively, player i and his/her partner −i; m i is the player's mass, f i (t) is the muscle-generated force vector. We also assumed that each player is subjected to gravity and to a small viscous force accounting for the damping caused by muscles and soft tissue. In all simulations, consistent with the actual experiments -see the main paper -we took m 1 = m 2 = 2 kg, b = 10 N s/m and k = 150 N/m. As in [1], we modelled the dynamics of muscle force generation as a second order system: where u i (t) is the activation vector, which is taken as system's input, and τ is the activation time constant, which we set to τ = 40 ms. By defining the overall state vector as T , the dyad dynamics can be rewritten in state-space form: where we defined I 2 = 1 0 0 1 and 0 2 = 0 0 0 0 .
If η 1 and η 2 are process noise sources (one per player), assumed to be Gaussian with covariance Σ η i , Eq. 3 can be rewritten as: . 5 it is possible to calculate the state vector x eq and the control inputs u eq 1 and u eq 2 which balance gravity forces: We then set x = x − x eq , u 1 = u 1 − u eq 1 and u 2 = u 2 − u eq 2 , so that the term c c disappears from the model. For simulation purposes, the model equations were discretised by using a first-order hold method, with a sampling rate ∆t = 1 ms over a movement duration of T = 2 s. We then obtained: After model discretisation, we added three extra state variables to store information about the position of the target x T and of the two via-points, x V P 1 and x V P 2 so that the new state is: In all simulations we took Σ u i = diag(1, 1) N 2 , identical for both players.

Task and cost functionals
The task (reaching a target through a via-point) was specified in terms of the following cost functionals (i = 1, 2): The cost functional has five terms. The first two terms enforce stopping on target at the end of the movement (small endpoint error, small endpoint velocity). The third term reflects the requirement to pass through the via-point (small via-point distance). The fourth term accounts for keeping the interaction force (proportional to the distance between players) low throughout the movement. The last term penalises the effort incurred during the movement. The weight coefficients determine the relative importance of the corresponding constraint. We set these weights by assuming (Bryson's rule) a maximum acceptable displacement (in the via-point and in the final target) equal to, respectively, the radius of the via-point (x max V P = 2.5 mm) and that of the target (x max T = 5 mm). We then set w vp = 1/(x max V P ) 2 and w p = 1/(x max T ) 2 . We made similar normalisations for the 'velocity' weight, w v -calculated by assuming a maximum acceptable speed at the target of 5 mm/sfor the maximum inter-player distance (25 mm) and the maximum activation (15 N).
The scalar coefficient r -the only free parameter in the model -specifies the trade-off between taskrelated accuracy and effort. With r 1, the optimal strategy is not moving at all. With r 1, the optimal strategy pays little attention to effort requirements. In all simulations we used r=1.
The cost functional reflects all the instructions that we gave to the participants, including those that were not included in the score displayed to participants during the experiments -for instance, reaching the target and stopping there. The cost functional also includes an additional essential requirementminimizing the effort -which is biologically motivated and is implicit in any motor task. Both the score displayed during the experiment and the cost functional include a term related to via-point distance and another related to average inter-player distance, but with some differences. First, the cost functional is a quadratic form and the score function is a sigmoid. In the cost functional these terms are expressed as square errors, whereas in the score they are expressed as absolute errors (also the sigmoid shape) The relative weights of these terms are also different. In the score we set the ratio interaction error/via-point error= 0.5, whereas in the cost functional the ratio is much lower (interaction error/via-point error = 0.01). However, the effort minimization term also indirectly contributes to reducing the inter-player distance. Therefore, the cost functional used in simulation can be considered as functionally equivalent to the score function used in the experiments.

Calculation of optimal via-point crossing times
In the cost functional of Eq. 7, the times of crossing of the via-points, tc 1 and tc 2 are themselves part of the optimization. To calculate the optimal crossing times, we systematically varied tc 1 and tc 2 (between 10% and 90% of total duration -set to 2 s in all simulations) over a square grid. For each crossing time pair, we calculated the average magnitude of the optimal cost for both players in the dyad, J 1 and J 2 . We then smoothed both the J 1 and J 2 mappings using a radial basis functions approximation. In all subsequent simulations, as the optimal via-point crossing times tc i we then took the values that corresponded to the Nash equilibrium (intersection of the reaction lines) calculated in the smoothed pair of cost functionals; see Figure A.
The Figure clearly indicates that there are indeed two cost function minima, corresponding to crossing V P 1 first and then V P 2 (tc 1 < tc 2 ) and vice versa. Given that V P 1 is closer to the start position than V P 2 , the first solution requires less effort and is therefore the global optimum.
We ended up with crossing time values of, respectively, 36% and 71% of the total movement duration (Nash condition) and 28% and 64% (No-partner condition). Hence the optimal crossing time values are slightly different in both via-points in the two extreme conditions (Nash and No-partner). To simplify calculations, in the subsequent fictitious playing simulations we used constant crossing time values (those corresponding to the Nash condition), which is indeed sub-optimal as they are expected to change at each iteration. To test the impact of this simplifying assumption, we ran additional simulations using the optimal crossing times corresponding to the no-partner conditions. We found slightly different values of the final minimum distance, interaction forces and leadership indices, but the main prediction (when information increases the learned strategy comes closer to Nash equilibrium) did not change.

Nash controllers
The optimal Nash feedback controllers can be determined through the following iterative algorithm [2]: for t ← (T − 1), 0 do solve for i = 1, 2: In the above equation and in all the following, i denotes a player and −i denotes his/her partner.

Optimal 'no-partner' controllers
A second (sub-optimal) scenario is represented a the situation in which each player assumes that his/her partner is inactive, i.e. u −i (t) = 0.
In this case, the optimal controllers are calculated independently, as two separate LQG optimal control problems: The above algorithm is separately applied to both players, i.e. for i = 1, 2.

(9)
A similar expression is found for H 2 .
The participants in the PV group are assumed to also see their partner's position so that the sensory information is defined as y i = [p i ,ṗ i , p −i ,ṗ −i , k(p −i −p i ), x T , x V P i ] T and H 1 and H 2 are modified accordingly.

Partner model
Partner's control input was estimated as part of the state observer. We made the prior assumption that partner input is described by a low-pass filtered white noise: In all simulations, we set A u = 1 and Σ ε −i = 1 N 2 .