Addressing joint action challenges in HRI: Insights from psychology and philosophy

,


Introduction
In the following decades, human societies will witness a pervasive use of robotic agents in all contexts of public and private social interactions.Social robotics is currently producing and designing robotic agents with use in numerous contexts like game companion (Sanghvi et al., 2011), education and therapy (Brage et al., 2018;McGlynn et al., 2014), or services (Kanda et al., 2010).For continuing these advances, social robotics needs to successfully design robots able to engage with humans, so they can collaborate on shared activities which require high levels of coordination.This need explains the fast expansion of the field of human-robot interaction (HRI), which attempts to develop different avenues for enabling robots to encounter social interactions.As part of this expansion, HRI research has taken inspiration from some important findings in psychology, philosophy of mind, and neuroscience to provide robotic agents with the necessary cognitive capabilities for achieving joint actions (Giger et al., 2019;Thomaz et al., 2016).
This approach in social robotics focuses on equipping robots with devices based on human psychological mechanisms underlying shared activities.Some of the mechanisms that roboticists have attempted to design include theory of mind, emotional recognition, or human-aware navigation (see Thomaz et al. (2016) for a review).Despite these advances, research in HRI suggests that equipping robots with social skills can sometimes, rather counterintuitively, undermine user experience and hinder the interaction between humans and robots (Giger et al., 2019;Sciutti et al., 2018).For instance, robot's human-like appearance or personality can be perceived as deceptive (Vandemeulebroucke et al., 2018): human robot interactions may be undermined by the novelty effect or the expectation gap between what people believe about the robot -especially when people did not have contact with robots and their expectations are infected by popular culture (see e.g., Sandoval et al., 2014)-and the actual competence of the robots (see Kwon et al., 2016, also Section 2.3.2).Furthermore, some of the robot's attributes or behaviors, like for example head-nodding (Thepsoonthorn et al., 2021), may trigger attributions of minds (Gray & Wegner, 2012) leading to a feeling of strangeness or unfamiliarity (known as the Uncanny valley effect), which can impact the humans' levels of trust toward the robot (Lewis et al., 2018).These negative effects may be increased with the implementation of certain social capacities and behaviors, especially when they are implemented in isolation.For instance, Thepsoonthorn et al. (2021) have found that the feeling of uncanniness related to head nodding does not appear when the behavior is accompanied with a manual gesture.Also, Riek et al. (2010) found that certain types of cooperative behaviors are more effective than others, but also that negative attitudes toward robots are strongly correlated with a decreased ability in decoding human gestures.The choice to focus on some socio-cognitive mechanisms in isolation usually goes with expertise in the field, given that specialization usually requires researchers to "dissect" some processes or situations.This appears necessary to gain some expertise, however, it also raises important questions for social robotics and its attempt to equip robots with the capacity to carry out joint action with humans. 1 Are we gaining expertise at the expense of considering the "broad picture"?Are we missing something when focusing on specific socio-cognitive capacities?Are there different strategies to explore to ensure fluency to interact with the robot?Besides, are there general strategies to influence human attitudes during HRI?
The aim of this paper is twofold.First, we argue that the sub-optimal or even negative effects of considering and designing specific capacities in isolation can be overcome if we focus on communicative capacities seen in human interactions, which can be implemented in the context of joint action for HRI.Second, we review psychological and philosophical literature that explores these human communicative capabilities to provide some exploratory ideas for design strategies in HRI.
The paper is structured as follows.In Section 2, after a brief overview of the notion of joint action and its underlying mechanisms, we present some well-known challenges of social robotics for establishing interpersonal relations between humans and robots.We suggest that these challenges can be addressed through an emphasis on the communicative mechanisms present in joint action, including those found in humanhuman interactions.In Section 3, we review different studies in developmental psychology, cognitive psychology, and philosophy of mind and language to demonstrate how such literature can help meet some of the challenges in HRI.We describe several mechanisms that may provide perspectives for social robotics in order to equip robots with robust communication capacities for joint action.

How can we define joint action?
An important number of social interactions and encounters are encompassed by the notion of joint action.Broadly considered, joint action is any form of social interaction whereby two agents or more coordinate their actions in order to pursue a joint goal.However, the notion of joint action has been subject to debate in philosophy and psychology.For instance, according to Sebanz et al. (2006), joint actions require the partners to coordinate "their actions in space and time to bring about a change in the environment" (p.70); while other authors (Carpenter, 2009;Cohen & Levesque, 1991;Fiebich & Gallagher, 2012;Tomasello et al., 2005) resist the idea that instances of mere coordinatione.g.two partners walking side by sideconstitute a joint action, considering that it requires some necessary conditions like sharing goals and intentions.
Moreover, while the notion of joint action is used interchangeably with the notion of collaboration or cooperation for some authors (Becchio et al., 2010;Kobayashi et al., 2018), other authors (Amici & Bietti, 2015;Chalmeau & Gallo, 1995) establish a hierarchy of interactions depending on the processes involved.According to Amici and Bietti (2015), for example, coordination is a fast low-level process of behavioral matching and interactional synchrony which could, but not necessarily, facilitate middle-level processes like cooperation (where some individuals bear certain costs to provide benefits to others) or high-level processes like joint action, which requires other resources like turn-taking and alignment of linguistic resources during dialogue.

What is necessary for joint action?
Leaving aside the debate on the concept of joint action, we aim to focus on the mechanisms that enable the consecution of joint actions.Three interrelated mechanisms appear to be key conditions for joint action: coordination, planning, and motivational alignment, each of them being supported by other processes.There has been an important deal of conceptual and empirical work investigating these processes (Knoblich et al., 2011;Pacherie, 2012;Vesper et al., 2016), from the sharing of a common ground to the anticipation of a partner's actions by way of emergent coordination (Curioni et al., 2017).
More specifically, intentional coordinationsometimes referred to as planned coordination (Curioni et al., 2017) requires the partners: (i) to represent their own and others' actions, as well as the consequences of these actions, (ii) to represent the hierarchy of sub-goals and sub-tasks of the plan, (iii) to generate predictions of their joint actions, and (iv) to monitor the progress toward the joint goal in order to possibly compensate or help others to achieve their contributions (Pacherie, 2012).
Indeed, joint action often involves planning several aspects, which involves the representation of the goal and the whole plan, and/or even the sequence of actions to be performed.The formation of these types of representations could rely on different mechanisms, which include highlevel processes such as theory of mind, team reasoning, or verbal negotiation (Bratman, 2014) or more low-level processes as minimally representing the joint action goal and knowing that it will be achieved with others (Michael & Székely, 2019;Vesper et al., 2010).An example of the mechanisms involved in planning a joint action is task corepresentations, which allows individuals to represent the details of each other's task.Several studies have demonstrated that people tend to represent others' tasks even when it is pernicious for their own task performance (Eskenazi et al., 2013;Sebanz et al., 2003Sebanz et al., , 2005)).Such capacity appears early in development; for instance, 5-years-old children can incorporate their partner's role into their own action plan 1 The concerns raised by this otherwise necessary compartmentalisation of work not only come from the experience of some of us, but are also a recurring debate in social robotics (see, e.g., Belhassein et al., 2020, Menezes et al., 2014, Seibt et al., 2020, Young et al., 2011).
K. Belhassein et al. during a joint activity, through the appearance of a joint Simon effect2 (see Saby et al., 2014).
These representations allow individuals to generate predictions about the other's actions, which in turn facilitate the adjustment and coordination between the partners.Interestingly, individuals can also facilitate the others' and their own predictions by communicating relevant and reliable information for joint action.The objective is to make actions more transparent and predictable so that the decisionmaking on the interaction can be fluent and successful.An interesting example of these mechanisms is sensorimotor communication.Several studies suggest that actors exaggerate their movements or kinematic parameters to make their actions more understandable to partners (Sacheli et al., 2013;Vesper & Richardson, 2014).Besides such implicit communicative devices, participants in joint action often negotiate on the fly the sub-tasks, sub-goals, or ways to proceed regarding the collective task through different explicit exchanges (Clark, 1996).Finally, communicative mechanisms do not only improve coordination and prediction by providing relevant information about the specific course of action, but also by providing information that impacts the motivational forces of the partner; for instance, motivating the other to remain engaged in the joint task or fulfilling others' expectations (Heintz et al., 2015;Michael & Székely, 2018).
An important body of research in psychology and philosophy thus suggests that these communicative processes play a fundamental role in joint action.One of our goals is therefore to highlight the significance of these devices for social robotics, without which the flexibility and efficiency of human interactions to HRI would be impossible to transfer.

Attempts and success in social robotics
A highly influential view in social robotics holds that developing likable robots or whose appearance imitates human physical features can improve users' experience, and thus, interaction.In this view, social robotics must aim at constructing robots whose appearance and behavior may appeal to human positive emotions (e.g.curiosity or likability).Some examples of robots instantiating this strategy are Jibo (Jibo.Inc.) or Pepper (Softbank Robotics) whose rounded forms give them a pleasant and friendly appearance.Also, we can find humanoid robots like Erica or Geminoid, developed by Ishiguro and his colleagues (Glas et al., 2016;Nishio et al., 2007), which are as physically identical to humans as possible.This general strategy of taking advantage of human preferences, inclinations, or curiosity is not restricted to their physical features.
Several labs (see, for instance, Breazeal, 2002;Craig et al., 2010;Kishi et al., 2013;Oberman et al., 2007;Wendt et al., 2008;Wendt & Berg, 2009) are, for instance, equipping robots with different emotional expressions in order to prompt empathy or pro-social attitudes into the human agent, so robots "could potentially tap into the powerful social motivation system inherent in human life, which could lead to more enjoyable and longer-lasting human-robot interactions" (Oberman et al., 2007(Oberman et al., , p. 2195)).For instance, Craig et al. (2010) have demonstrated that facial expressions of the BERT humanoid robots can elicit the same neuronal responses as human facial expressions in an emotion recognition task.Robotic emotional expressions can be implemented in very different ways, which are likely to operate as communicative cues.For instance, while many works involve facial expressions in anthropomorphic faces (Ahn et al., 2012;Kedzierski et al., 2013;Lütkebohle et al., 2010), other implementations involve posture (Breazeal et al., 2007), body motion (Kishi et al., 2013) or pace patterns (Karg et al., 2010).Moreover, pro-social attitudes and emotional states can be elicited through eye gazes, gestures, or speed of movement.Ham et al. (2015) have demonstrated that human users find robots more persuasive when they orient their eyes and heads toward them, and Riek et al. (2010) have shown that equipping robots with cooperative gestures (beckon, give, shake hands) influence the human motivation for joint action.Finally, Wendt et al. (2008) have elicited different human emotional states (stress, boredom, surprise, and perplexity) using a robot arm in a LEGO building block by modifying speed and intervals.
Moreover, some of the robot's gestures or legible motions can also improve prediction in joint action, through kinematic signaling (Beetz et al., 2010;Dehais et al., 2011;Dragan et al., 2015;Holladay et al., 2014;Huang & Thomaz, 2010;Kruse et al., 2013;Lichtenthäler & Kirsch, 2013;May et al., 2015;Riek et al., 2010;Sisbot & Alami, 2012;Trujillo et al., 2019).For example, Holladay et al. (2014) proposed a mathematical model that generates pointing configurations making the target object easier to recognize by novel users.Moreover, Dragan et al. (2015) have demonstrated that legible motions planned to express the robot's intents lead to more fluent collaborations than motions planned to match the partner's expectations or than functional motions.These findings are confirmed by other studies showing that stereotypical motions, along with straight lines and additional gestures (see Lichtenthäler & Kirsch, 2013 for a review) are pivotal factors for legible robot behavior.In the same vein, different strategies have been explored in connection with anticipation of motions (Coovert et al., 2014;Szafir et al., 2015;Triebel et al., 2016).For instance, Khambhaita et al. (2016) equipped Spencer, a socially aware service robot, with the capacity for anticipating its next movement by looking at the target direction to improve social motion and avoid collisions, while Coovert et al. (2014) (see also Chadalavada et al., 2015) used projections of visual arrows and a simplified map to communicate the intended movements.Finally, Huang and Thomaz (2010) have demonstrated the impact of soliciting gestures and gazes for ensuring joint attention between humans and robots.
Certainly, communication is bidirectional, and thus requires not only producing signals and providing information but also understanding and interpreting the others' signals.In this sense, a substantial effort has been directed to equip robots with capacities for understanding different human social cues, including gazes, gestures or facial expressions (e.g.Alazrai & Lee, 2012;Benamara et al., 2019;Boucenna et al., 2014;Burger et al., 2012;Liu & Wang, 2018).For example, Droeschel et al. (2011)  Now, how can these efforts be oriented to design more efficient social robots, able to engage in joint action?The aforementioned capabilities like recognizing human gestures and facial expressions, or producing eye gazes are oriented to reducing different types of uncertainties and thus fostering the readiness to interact, by providing different pieces of information that become common knowledge.However, despite the enormous advances regarding the equipment of robotic agents with socio-cognitive capacities, the attempts to establish a better mutual understanding between humans and robots have not always turned out to be successful.Recent studies in HRI have demonstrated that constructing social robots with social "ingredients" does not necessarily improve the interaction or the human experience with the robot (Giger et al., 2019;Sciutti et al., 2018).

Obstacles and challenges in social robotics
Empirical literature assessing the users' experience with robots has detected several elements that may undermine interactions between human and robotic agents, concerning human motivation.Different studies in HRI have suggested that humans are not always inclined to interact with robotic agents.An often-voiced problem is the well-known Uncanny Valley Effect (Wang et al., 2015 for a review).The uncanny valley effect refers to the phenomenon whereby humans experience a feeling of discomfort or revulsion when perceiving a machine or artifact that acts or looks like a human.The first reference to this effect appears in the work of Mori (1970), and has been observed not only in human adults but also in children (Yamamoto et al., 2009).Several studies have suggested that the uncanny valley is often associated with the robot appearance (Wang et al., 2015).However, other studies also suggest that the effect may be associated with certain non-verbal behaviors (Thepsoonthorn et al., 2021) or, more generally, with any cue from which one may infer the existence of the robot's mind (Gray & Wegner, 2012), what seems to suggest that not only appearance but also implementing certain actions and skills in robots may have drawbacks for HRI.Experiencing negative feelings and discomfort when perceiving or interacting with a robot can interfere with the motivation of humans to engage in social interactions with robots.This could be especially important in contexts where the relation between the human and the robot must be tied, for instance for robotic companions who assume roles for elder care, teaching and childcare or therapy.In fact, as Hoffman (2020) has suggested, the uncanny valley effect can take a different form in this kind of contexts, namely, as "a resistance to accept the social roles and behaviors of a robot, especially in companionate and relational settings" (p.535).This Social Uncanniness may undermine the motivation to interact with robots and, although it does not necessarily mean abandoning collective action, it can fuel negative aspects that could be detrimental to the HRI.
Other potential negative attitudes toward robots involve implicit biases and low levels of trust.First, in a series of experiments including implicit association tests, which measure the reaction times depending on the associations between a target (a robot or a human) and positive or negative attributes, Sanders et al. (2016) have found that participants exhibit implicit negative attitudes toward robots, even when they express explicit positive assessment toward them.Those experiments seem to demonstrate that people exhibit certain negative attitudes toward robots, or at least, less positive stances than the ones they direct toward humans.Second, our everyday interactions with humans require an important and appropriate level of trust.Several social transactions make us vulnerable to the other's actions, to the extent that the consequence of the transactions is based on the expectation that the other party will behave as he should.However, we have reasons to believe that several characteristics of HRI may influence this level of trust.For instance in a recent meta-analysis, Hancock et al. (2011) suggest that the level of trust in robots may be influenced by performance-based factors (reliability, false alarms, or rate-failures) and robot attributes (proximity, anthropomorphism, or personality).These factors can undermine the interaction in different ways.On the one hand, for instance, a robot with a pleasant personality or anthropomorphic features may induce a high level of trust, which may lead humans to generate unrealistic expectations regarding the robotic agent, over-rely on the latter, and then limit the monitoring of some aspects of the interaction.This "expectations gap", once faced with the real capabilities of the robotic agent (see also Kwon et al., 2016), may undermine the interaction itself, and thus deteriorate the motivation of the user to engage with the robots on subsequent occasions.On the other hand, an initially low level of trust may produce resistance to start interacting with the robot or provoke her to abandon the collaborative task, ending up in the disuse of the robot (Lewis et al., 2018).The lack of trust is especially problematic in work contexts where a major part of the main task or relevant sub-tasks must be carried out by a mechanical agent.Imagine, for example, what it would be like to work on an assembly line where blushing mechanical arms are unreliable, or in commercial contexts where a blushing colleague does not carry out his tasks reliably and reliably.
Summing up, several features in HRI can produce aversive and negative attitudes or influence the level of trust, which can damage the motivation of the human to interact with robotic agents.
In addition to motivational aspects, the interaction with robots may also be impaired by prediction issues.The ideomotor theory, which postulates that we initiate our actions by predicting their effects in the environment (James, 1890;Shin et al., 2010;Stock & Stock, 2004), laid the first foundations for the idea that the mental representation of an action or its effects would activate the motor codes for that same action and thus cause a tendency to perform it (Sebanz & Knoblich, 2009).Similarly, the observation of an action performed by others activates our own motor system (Heyes, 2011).This phenomenon of motor resonance was first evidenced by studies demonstrating the existence of mirror neurons in monkeys (Gallese et al., 1996;Rizzolatti et al., 2001) and humans (Heyes, 2010;Kilner et al., 2009), activating both during the execution of a movement but also when they observed the same action generated by others.In the same way, the execution of an action can be hindered by the observation of another action by others through motor contagion (Bouquet et al., 2011).This activation of our own motor system during the planning, the mental representation or the observation of an action (Prinz, 1997) is notably explained by the theory of event coding (Hommel, 2009), which postulates that the representation of actions is done via networks called event files and corresponding to the different characteristics of the perceived effects of these same actions.Similarly, we would predict the effects of our actions before executing them through generative or integrative forward models (Pesquita et al., 2018;Wolpert et al., 2003).Importantly, the prediction of actions and their effects appears to be essential for successful joint action (Curioni et al., 2019;Sebanz et al., 2006) by among other things, enabling interpersonal coordination during motor interactions (Sacheli et al., 2021).Finally, a study by Vesper and Richardson (2014) showed that even without a direct perception of the other's action, partners involved in a joint action successfully predict the action of their co-agent and incorporate this prediction into their own action planning in order to complete the task successfully.
Several studies have investigated the influence of the agent's nature on the phenomenon of motor resonance, in particular in the case of a robotic agent.Indeed, humans must interpret the robot's motions and adapt their behavior to collaborate efficiently and safely; the study of motor resonance phenomenon in the context of HRI thus seems particularly relevant, especially to investigate the unconscious responses of humans to robotic agents (Sciutti et al., 2012).If some studies have shown no motor contagion during the observation of robotic motions (Kilner et al., 2003;Tai et al., 2004), therefore concluding that the mirror system is not activated by the observation of a robot's mechanical (non-biological) motion (Press et al., 2005), other imaging studies have nuanced this statement.In some cases, the mirror neuron system seems to be activated during the observation of mechanical motions performed by non-human agents (Gazzola et al., 2007 et al. (2003) showing no motor resonance during the observation of a robotic agent, but this time using a humanoid robot, Atkeson et al. (2000) have highlighted a phenomenon of motor interference identical to the one appearing during human interactions, i.e. the observation of an incompatible action (carried out by the robot) disturbed the one the participants were executing.Motor contagion has also been found with a humanoid robot (Bisio et al., 2014), for movements whose trajectory was characterized by biological kinematics.In addition, the acquisition of bidirectional action-effect associations (action and its motor codes linked to the effects of this action in the environment) by observation, enabled by motor resonance mechanism, was found to be possible by the observation of a simple virtual agent presented via a computer screen (Belhassein et al., 2021).Finally, an EMG study has investigated how the appearance and type of motion of an agent can influence the muscle activity of people observing or imitating videos (Hofree et al., 2015).More precisely, the authors demonstrated that motor simulation appears during observation and imitation of all types of agents (humans and robots with mechanical or biological appearance), but still with a stronger effect for human agents.In a nutshell, there is no consensus about whether or not motor resonance mechanisms can be successfully used to generate predictions regarding the robot's behavior.
Taken together, despite the considerable advances of social robotics for equipping robots with social skills, these empirical findings show that several factors can hinder the interaction between humans and robots, impacting in particular users' motivation and the possibility of prediction, which are two pivotal requirements for joint actions.Motivation is indeed fundamental for the partners to engage in the collaborative task, but also to remain engaged when more desirable individual options appear.Successful joint actions also require partners to efficiently coordinate in space and time, and adjust their behaviors to each other to reach the shared goal.Faced with the challenge of designing strategies to overcome or compensate for the impact of these elements in the context of joint action, social robotics has already tackled several issues.Developing this expertise requires focusing on specific capacities of the robotic agents, displayed in specific contexts.As a consequence, research goals in HRI are sometimes restricted to solving a very particular problem (e.g.avoiding collision, anticipating a movement, or using and tracking gazes) occurring in the course of joint action, which normally implies considering such capacity in isolation and under laboratory conditions that do not always represent real interaction conditions.However as we have seen above, joint action situations involve several types of uncertainties, including some that can emerge before the interaction itself.
Therefore, it appears necessary to go beyond the elements of design that serve an instrumental purpose regarding a given task and consider the whole robot as a social agent itself, who must engage proactively in contributing to a mutual understanding between the participants in the joint task.In our view, such objectives can be reached by focusing on the different communicative mechanisms involved in joint action, as alternative ways of sustaining motivation and re-establishing routes of prediction.In the following section, we review different findings and proposals in philosophy and psychology to provide exploratory ideas on some communicative strategies that could be implemented in HRI.

Why focusing on communication?
An important part of human psychological devices involved in joint action is communicative, serving different purposese.g.negotiating, guiding, questioning (Austin, 1962;Clark, 1992;Sperber & Wilson, 1995) and leading to mobilize different types of information.This flexibility allows us to provide information about the relevant objects involved in a task, but also about the emotional or cognitive states of the participants.In the HRI context, this can help us to explore different routes or strategies to overcome the problems presented above that involve some degree of uncertainty in the different stages of joint action.
According to Michael and Pacherie (2015), participants can face three sources of uncertainty during joint action, which can overlap and influence each other.First, motivational uncertainty refers to the uncertainty of not knowing whether or not the partner is motivated to engage in the overall joint action, a particular goal, or sub-goal, or her degree of motivation.Second, instrumental uncertainty refers to the state of not knowing the other participant's instrumental beliefs on how to proceed, which roles to assume or when and where to act.Finally, common ground uncertainty emerges when instrumental beliefs and motivations are not mutually manifested.Thus, even if the participants share a goal or agree on how to proceed, they might not know that this is the case.Any communicative act or strategy is directed to reduce common ground uncertainty, making mutually manifest a piece of information that can involve instrumental or motivational states, aspects of the environment, goals, or other relevant information for the consecution of the joint action.In a minimal sense, then, communicative strategies can be defined as overt stimuli -whether they are verbal or non-verbal-generated to activate, add up or update the common ground and knowledge related to a particular joint action.
Be that as it may, we contend that interacting with robots able to exhibit such communicative behavior can impact both motivation and prediction in the context of joint action.By making a piece of information (e.g.social roles) mutually manifest, communication can indeed influence humans' attitudes toward robots as potential partners, eliciting pro-social motivation, and also improve coordination and prediction by establishing several lines of understanding between robots and humans.Designing communicative strategies can thus have several functions likely to improve joint action.Studies in philosophy and psychology may offer some descriptions as potential guidelines for these strategies, including for example the notions of joint attention, commitments, or motivation alignment.The next section is devoted to present these potential guidelines, articulate how they facilitate the reduction of certain uncertainties -and thus, eliciting and facilitating coordination-and emphasize how they are directly relevant for HRI.Although we are aware that human-robot joint actions mainly involve non-verbal communicative strategies, we have chosen to emphasize verbal communication as well in some of the following sections, as it is a modality both powerful and spontaneous in humans.Many of the theories and notions that describe the dynamics of verbal communication (e.g., speech act theory) are also often used to characterize non-verbal behavior.Furthermore, as discussed above, there are reasons to believe that the uncanny valley effect caused by some non-verbal signals could potentially be countered by the verbal communication that accompany non-verbal behavior.

Joint attention and common ground
To perform a joint action, partners need a common goal.Indeed, joint action requires that individuals plan and perform their actions according to their predictions about the other's actions to reach this goal.Joint attention is a key feature for this purpose, as it allows the partners to establish and share a perceptual common ground, necessary to initiate the joint action but also for individuals already engaged in a joint action to coordinate successfully.Commonly defined as the ability to coordinate our attention to the same object of interest (e.g.Bakeman & Adamson, 1984), joint attention thus enables us to integrate others' attentional focus and therefore to experience the world together (Tomasello, 1999).It is a key element of social cognition, playing a crucial role in « being and acting together » (Tomasello & Carpenter, 2007).
The first studies of joint attention began with Bruner and collaborators in developmental psychology (Bruner, 1974;Scaife & Bruner, 1975).In a study with children aged 2-14 months old, the authors have shown that the ability to follow and share attention with others increases with age (Scaife & Bruner, 1975).Butterworth and Jarrett (1991) have determined three successive mechanisms involved in joint attention in the course of development: a first "ecological" mechanism by which children correctly follow the general direction of their parent's gaze, but cannot identify the target if there are two in their visual field; a "geometric" mechanism by 12 months that allows the child to follow precisely the other's gaze and identify the target's location; and finally a "representational" mechanism that appears between 12 and 18 months, thanks to which children can understand that their parent's gaze is directed toward a target outside of their visual field by forming a mental representation.
These episodes of joint attention, firstly massively initiated by the adult (Matthews et al., 2012), allow the child during his/her K. Belhassein et al. development to progress from the simple sharing of attention by responding to the solicitations of the adult, to the direction of others' attention through initiation and continuity of joint attention (Carpenter et al., 1998;Mundy & Jarrold, 2010).Numerous studies have investigated the role of these joint attention processes on the development of other social and communicative abilities.In particular, children's ability to produce pointing gestures, illustrating their access to referential communication (e.g.Butterworth, 2003), has been shown, first, to rely on the ability to develop joint attention, and second, to contribute to its increasing complexity (Mundy, 2003;Vaughan Van Hecke et al., 2007).Joint attention has also been shown to predict the development of language skills: in a study with 6-to 18-month-old children, Morales et al. (1998) showed that the ability of 6-month-old children to follow the other's gaze predicts a better language vocabulary at 18 months old.Joint attention skills thus underpin the development of social cognition: they allow children, through an active role in the dynamics of social interaction (see Kidwell et al., 2007), to integrate the other as an attentional and intentional agent (Tomasello & Farrar, 1986).This founding role is particularly visible in atypical development, such as in children with autism who have difficulty communicating both verbally and non-verbally (Sigman et al., 1986) and who also show limited joint attention skills (see Bruinsma et al., 2004 for a review).
Several authors have added some complexity to this picture by going one step further in defining joint attention.While for some of them, two agents orienting their attention toward the same referent is a sufficient criterion to speak about joint attention (Butterworth, 1998;Butterworth & Jarrett, 1991), others have highlighted the need (1) to develop mutual knowledge of this coordinated attention, and (2) to represent the other agent's intentional states (Carpenter & Liebal, 2011).Tomasello (1995) for example defined joint attention as "people experiencing the same thing at the same time, and knowing together that they are doing this".Accordingly, joint attention can either be explained with basic learning mechanisms (lean joint attention) or as "the result of particular cognitive operations or second-order representational competencies" (Racine, 2011), i.e. what Tomasello andCarpenter (2007) called socio-cognitive abilities of shared intentionality (rich joint attention).The opposition between "lean" and "rich" views of joint attention may parallel the contrast between current research in the field of HRI, which tends to focus on "surface behaviors" (Kaplan & Hafner, 2006), like simultaneous looking or coordinated behaviors, and research in developmental psychology showing that joint attention encompasses a variety of socio-cognitive processes.
Examining the notions of common ground (or common knowledge) and mutual recognition may help reduce this gap, as they have also played a fundamental role in the definition of joint action in both philosophy and psychology (Alonso, 2009;Bratman, 1992Bratman, , 1993;;Clark, 1996;Cohen & Levesque, 1991;Lewis, 1969;Miller, 2001;Tollefsen, 2005; see also Blomberg, 2016 for criticism).First, Lewis (1969) claims that a proposition P is commonly known among two agents if the proposition is known by the two agents and both agents know that the other can draw the same conclusions from P that P can.In another famous formulation (see Schiffer, 1972;Thomas et al., 2014), common knowledge must be understood as the recursive belief in which S knows P, Y knows P, S knows that Y knows P, Y knows that S knows P, S knows that Y knows that S knows P, and so on.The subject does not necessarily represent the whole line of reasoning beforehand but should be able to infer it.Thus, we can assume that from the individual point of view, common knowledge or common ground is the information that one may reasonably assume that one and her partner know and they can also know or infer that the other knows.For our purpose, such information may include goals and sub-goals, intentions (see Bratman, 1992), ways to proceed, instrumental beliefs, facts on the environment, appropriate scripts and roles, and any other type of information necessary or relevant for the consecution of the joint action.
Second, the sharing of common ground is closely linked to what philosophers call mutual recognition (Brandom, 2007;Satne, 2014;Scanlon, 1998), which allows individuals to identify and accept each other as social agents.Such mutual recognition requires each individual to identify the other as a partner for the available interaction, to generate expectations and anticipations depending on different physical and social features, information concerning previous interactions, or social structures like norms or conventions.Moreover, mutual recognition requires implicit or explicit confirmation that the other, as a social partner, somehow accepts the interaction.One of the safest ways to establish this recognition is to rely on communicative strategies.
Robots are regularly faced with failure situations because of common ground uncertainty or poor coordination during joint action.Therefore, joint attention appears to be an essential component for human-robot interactions and leads to significant advances in HRI, as shown by the numerous studies on this topic (see Admoni & Scassellati, 2017;Boucher et al., 2012;Huang & Thomaz, 2010;or Staudte & Crocker, 2009).Indeed, because joint attention allows sharing perceptual common ground, it leads to improved coordination, reduced uncertainties, and helps to perform better to reach the common goal.Similarly, it is necessary to establish mutual recognition and so to recognize the robot as a social partner in order to make human-robot interactions less ambiguous.

Communication for mutual recognition: examples of recognitives and observatives
The recognition of the other as a potential partner for joint action can be carried out by verbal and/or non-verbal communicative cues, which can be more or less explicit at different stages of the interaction.The inferential processes at play in such contexts have originally been explored in the frame of pragmatic theories, in particular through the notions of relevance (Sperber & Wilson, 1995) or Grice (1989).Interestingly, humans often establish communicative strategies to facilitate information exchange before the joint action itself, even in situations where social norms, conventions, or scripts are available to regulate our social interactions (Andrews, 2012;Fernández Castro & Heras-Escribano, 2020;Schank & Abelson, 1977;Zawidzki, 2013).For instance, as customers, we usually know how to interact with a waiter in a restaurant because the parties involved know some clear rules of etiquette, social norms and knowledge of how to proceed that regulate the interaction to achieve the joint goal of having a meal.However, even when these rules and norms exist, human interactions require signaling and communicating different types of information regarding the initiation, maintenance, or the exit of joint action, the acknowledgment of roles assignation, or specificities regarding preferences, goals, and substasks.One can engage in communication employing so-called recognitives or observatives, speech acts whose main function is to call another person's attention upon herself, or other aspects of the context in order to make her aware that recognition is in place.
An example of recognitives is vocatives, like greetings that are precisely used to call a person upon herself.Vocatives can enable mutual recognition and facilitate role assignment in some contexts (e.g."Welcome to our restaurant!" in the previous example).Moreover, vocatives are often followed by other speech acts like questions that can help to set the sub-tasks or goals of the joint action (e.g."What can I do for you today?").Another example of recognitives is acknowledgments, whose function is to make the other aware that you recognize or take on what they say (e.g.answering "thank you" to the vocative "welcome").They allow individuals to acknowledge each other's recognition and to ensure that the fact joint action will take place is mutually shared.
The other types of speech act relevant for mutual recognition are observatives, which serve to identify a potential joint goal by directing the other's attention toward a specific object or event in the near environment.For instance, imagine two hunters searching for prey; when one calls the other "Hey, a deer!", they can start coordinating to capture the animal.Such speech acts can facilitate the recognition of the other as a potential partner for the joint action and then trigger the set of expectations and anticipations necessary to coordinate and perform the action.
In a nutshell, humans have a whole set of speech acts at their disposal to facilitate the establishment of mutual recognition between partners and then the initiation of joint action.These analyses are further supported empirically by studies showing for example how adults (Brosnan et al., 2012) and 4-year-old children (Duguid et al., 2014) use verbal communication to achieve common knowledge and solve the problem of deciding whether or not to cooperate in some contexts.
Even though the verbal modality can notably modify the common ground between the speakers (see also Clark, 1992;Stalnaker, 1978Stalnaker, , 2002 for linguistic inputs), it is far from being the only way to establish mutual recognition between individuals.We can find non-verbal modalities of communication analogous to recognitive or observatives.For instance, communication can stem from subtle cues like the mere reaction to the presence of the other with a frown movement or the search for eye contact.As Brinck and Balkenius (2018) argue, by making eye contact, one individual is attending to the other attending to the first, which can implicitly be regarded as a joint commitment to interact in most social contexts.Such analog to vocatives can be also introduced by other embodied strategies such as widening of the eyes, partially opening of the mouth or suddenly stilling of the limbs, all likely to reflect a possibility and/or search for confirmation to establish a joint action (Reddy & Morris, 2004, p. 658).
Likewise, acts of acknowledgements can be performed non-verbally as well: people often direct each other's attention toward external objects or events through non-verbal reference, whether it involves vocalizations, gestures, and/or gazing (Bard, 1992;Bates, 1979;Brinck, 2008;Leavens et al., 2004;Leavens et al., 2005).Non-verbal reference includes four essential actions: a preparatory behavior that draws the observer's attention to the sender, a communicative-intent indicating behavior to signal the sender's attempt to share attention and interact face-to-face with the observer; a referential behavior, to orient the other's attention in the direction of the target object or event; and an essentially intentional behavior that orients back the attention to oneself to make sure they understand the act (Brinck, 2008, p. 122-123).
Another way of establishing mutual recognition through non-verbal communication is to rely on ostensive acts, which can be regarded as non-verbal observatives.Notice for example how humans, from early infancy, are capable of predicting a course of action based on a target's hand movement (Koch & Stapel, 2019), sustained visual attention (Vaish & Woodward, 2010), or contextual constraints (Gergely et al., 2002).These capabilities are often used to build communicative strategies.For instance, Csibra and Gergely (2006) suggest that infants read some social signals -e.g.eye contact-as pedagogical signals: they indicate that what followed, for instance, a course of action, is an important piece of information.According to Csibra and Gergely (2006), these findings suggest that infants interpret eye contact as an "ostensive" act, understanding that the adult intends to communicate important information imminently.Moreover, parents often emit unconscious signals to reinforce such capacity, which generates a learning loop established by communication.
As a result, this type of communication aims here to orient the attention of the receiver to a particular object, but also to make her aware that the sender intends to share a particular purpose.Thus, in the context of joint action, this type of referential acts or even one of its components can play the same function as speech acts of acknowledgment: it can help initiate joint action and ensure that this possibility is common knowledge (Carpenter & Liebal, 2011;Chwe, 2001;Gómez, 1996;Thomas et al., 2014) or facilitate the recognition of a social affordance (Becchio et al., 2010;Krueger, 2011) -a property of an object or event to permit or forbid a social action.For instance, Sartori et al. (2009) investigated how the other's movements (e.g.extending the arm with an open hand toward an object) can perturb or influence the kinematics of a preplanned action.In this study, participants were asked to reach, pick and place an object; however, in 20% of the trials, the experimenter unexpectedly stretched out her arm and unfolds the hand in a movement of request.A significant variation in the trajectory was reported and, in some cases, the participants even abandoned their plan to give the object to the experimenter.In contrast, no perturbation was found when the movement of the experimenter was arbitrary (i.e., showing no request features).These findings suggest that social signals can trigger social affordances and activate motor responses, which could facilitate the anticipation of behavior.
The establishment of mutual recognition is fundamental for the initiation of the joint action but also strongly influences its deployment.For instance, establishing mutual recognition facilitates the assignment of roles, which also determines the communicative strategies used during the execution of the action.The studies on the exaggeration of behavior mentioned in Section 2 illustrate this point: in Sacheli et al.'s (2013) experiments (see also Vesper & Richardson, 2014), for instance, two participants had to synchronously grasp an object in an imitative vs. complementary way, each by acting as a Leader or a Follower.The results showed that when acting as leaders, participants tend to give information to their partners about the action to be performed by accentuating some kinematic parameters and reducing the variability of movements, then increasing their predictability by the follower.Several research studies have further demonstrated that the assignment of Leader and Follower roles influenced participants in how they adapted their movements to improve interpersonal coordination in the joint action (Curioni et al., 2019).In an experiment with musicians, Goebl and Palmer (2009) have shown that the leader of a piano duets raised his fingers higher in the absence of auditory information, in order to communicate to his partner the movement timing.Those experiments indicate that signaling is fundamental during the performance of a task, even when the roles are somehow already assigned to the participants before engaging in joint action; it helps to coordinate interactions and optimize the joint action by minimizing uncertainties for each partner and thus increasing predictability regarding the sequence of actions (Pezzulo et al., 2013;Pezzulo & Dindo, 2011).These studies also illustrate that mutual recognition and the communicative strategies used to engage in joint action are fundamental, both to confirm the assigned roles in a given situation (thus allowing individuals to initiate the action), and to maintain joint action throughout the activity.In other words, to properly understand the importance of communication in joint action, we should take into account not only how a specific communicated information may influence the common ground at a given step, but also how it can impact subsequent steps of the action and the joint action as a whole (Vesper & Sevdalis, 2020).Now, how is this communication for recognition relevant for HRI?To answer this question, we must emphasize that the aforementioned communicative strategies, as they facilitate the establishment of recognition, are important tools for reduction of uncertainties of different types (see Section 3.1).For instance, the use of recognitives or vocatives automatically reduces the motivational uncertainties by implicitly declaring that one is ready to interact.At the same time, the establishment of roles facilitates the reduction of common ground and instrumental uncertainties, to the extent that they can make explicit the rules and social structures at stake in the context and make mutually manifest certain general beliefs about how to proceed.The reduction of these two types of uncertainties is especially relevant in the context of HRI.On the one hand, we have seen how different factors like the novelty effect may produce certain reluctances to interact with the robot.These effects can be counterbalanced precisely by establishing recognition and expressing motivation to act.On the other hand, making environments less uncertain is especially relevant in HRI contexts as we know that, despite advances in social robotics, robots are still especially challenged with failures in unstructured contexts (Honig & Oron-Gilad, 2018).It is precisely the establishment of roles and the reduction of instrumental and common ground uncertainties that allows them to perform better in such environments.Moreover, the use of such strategies will become even more necessary when robots start to take on different roles in the same context, for example a robot that can take on the role of a co-worker or a salesperson in the context of a shop.3

Commitment, emotion, and motivation
All the studies described above seem to demonstrate that communicative signals play a central role in the possibilities of prediction and coordination.By helping mutual recognition, these signals can also improve the motivation to initiate and perform the joint action, which has been theorized through the notion of commitments.When individuals search for eye contact with someone else, they are not only making the other aware of their availability to engage in a joint action, they are also implicitly declaring their commitment to interact and behave in accordance with this expectation.In a series of experiments, Siposova et al. (2018) presented 5 to 7-year-old children with a Stag Hunt coordination game, in which they could decide to play individually or to cooperate.In a first study, the authors compared two conditions with 5-year-olds: in the first condition, adults produced ostensivecommunicative looks, with eyes widely open and raised eyebrows, while in the second condition, they produced non-communicative looks, i.e. shorter sequences without raised eyebrows.The authors' hypothesized that only communicative eye contact would permit mutual recognition and then encourage children to play cooperatively.In a second study, the authors investigated the reactions of 6 to 7-year-old children in case the partner did not cooperate.The results showed that children tend to cooperate more in the communicative eye contact condition and to protest more if the partner did not cooperate.These studies highlight the fact that communicative eye-contacts can be regarded as a form of commitment for cooperation and thus increase coordination.In other words, showing communicative-intent indicating behavior (see Section 3.2.2) can signal to the other that one is ready to initiate a particular joint action, which could provide motivation for that action.
Indeed, humans tend to interpret some social signals as implicit commitments or obligations to behave as the joint action demands.Michael and Pacherie (2015) have emphasized this connection between motivation, expectation, and commitments in joint actions.According to them, commitments play a fundamental role in a joint action by stabilizing expectations, which can reduce the uncertainties inherent in any interaction and provide reasons for cooperation.When people indicate that they are committed to perform a particular action, they are not only reducing motivational uncertainty -providing reliable cues about their action-but also providing reasons to cooperate on the basis that they are going to do their part, leading to the expectation that they expect the other to do the same.
The idea that we find other's expectations about our own behavior motivating and appealing by themselves is not new in philosophy and psychology.For instance, Lewis (1969) argues that we assume the existence of presumptive reasons: we are inclined to act to fulfill other's expectations when it is reasonable to have such expectations, i.e., under certain conditions.In a series of experiments, Heintz et al. (2015) have shown that people's pro-social preferences in dictator games, where a subject (the dictator) must decide whether or not to send some money he/she receives to another subject (the recipient), are sensitive to the other's expectations.In their studies, the dictator systematically exhibits more altruistic choices when he/she was told that the recipient expected his/her sending a particular amount of money.Similarly, Bonalumi et al. (2019) have tested the sensitivity to implicit signals about others' preferences in the context of joint action.In one of their studies, participants were presented with vignettes describing situations in which implicit commitments between characters were violated.For instance, a situation involves two work colleagues who always have coffee together in the same place, although they never explicitly agreed to do so.They keep having coffee for some time until one character doesn't show up.Several questions were then asked to participants to assess their perception of the situation, in particular regarding the right of the character to demand an explanation to the other, and the assumed degree of negative emotions if no explanation was offered.The experimenters also introduced a temporal variable (e.g., how long had the coffee routine been going on?).They found that people tend to provide more negative emotions and negative normative judgments when the commitment was violated after a long routine period, compared to a more recent routine.This seems to demonstrate that the repetition of a joint action is perceived as an implicit cue to commitments, and that "people judge there to be an obligation to fulfill others' reasonable expectations even when these expectations have not been made explicit" (p.685).In a nutshell, both implicit and explicit social signals regarding commitments enhance the motivation to engage in joint actions.
Moreover, Heintz et al. (2015); see also Sugden, 2000) postulate that human proclivity to engage in joint activities and fulfill others' expectations is caused by our aversion to disappointing others.In a similar perspective, Godman (2013); Godman et al., 2014) has argued that humans tend to engage in joint action due to a general psychological disposition to find affiliative stimuli rewarding (see also Salmela & Nagatsu, 2016).Interactions with others would then be intrinsically motivating and joint actions motivated not just by the desire to achieve an intended and shared outcome, but also by the desire to obtain this social reward.Evolutionary psychologists have theorized this idea of social reward through the notion of reciprocity: cooperating with others would be motivated by the assumption that one will receive benefits in return, benefits that can be direct or indirect and occur at different timescales (e.g., Romano & Balliet, 2017).
A more constrained hypothesis has been put forward by Fernández Castro and Pacherie (2021) who argue that, although humans give attentional priority to social cues and find them rewarding, their prosocial tendency lies in the need to belong (Baumeister & Leary, 1995;Over, 2016), i.e. the need for frequent, positively valenced interactions with other people within a framework of long-lasting concern for each other's welfare.In other words, joint actions are often motivated by the need to affiliate with others and form long-lasting bonds, to preserve and reinforce the bonds already forged.
Both implicit and explicit communicative cues can influence these motivational factors, reinforcing commitment and thus promoting joint action.Chartrand and Bargh (1999) have for instance demonstrated the motivational impact of the chameleon effect, which refers to the nonconscious mimicry of postures, gestures, and facial expressions when interacting with a partner.Individuals who were more likely to cooperate and who possessed more empathic dispositions were found to systematically exhibit the chameleon effect to a greater extent.These findings suggest that mimicry can be interpreted as a signal of the disposition for prosociality, therefore facilitating the initiation of joint action.Similarly, synchrony has been described as a rewarding social cue: Reddish et al. (2013) have found that moving together following the same rhythm also promotes prosocial behavior in economic games.The detection of these implicit communicative cues generally requires the cerebral activation of the amygdala, regarded as the integrative center for emotions and associated in particular with the reward system (Gamer et al., 2013).
More broadly, emotional expression may represent an important communicative strategy to facilitate joint action.Shared emotions, i.e., emotions whose expression has been directed to a partner and detected by the latter, have indeed been argued to play a pivotal role in motivating the initiation and the continuity of joint action (Michael, 2011).They can help to monitor performance and/or engagement: "a person's emotional expressions can transmit information about how she appraises her progress toward the goal of her own task, or the group's progress toward the global goal of a joint action" (p.364).Thus, expressing emotions like excitement or enthusiasm toward the joint goal can provide the partner with the motivation to remain engaged in the joint action.Similarly, if one's performance during joint action provokes the expression of some distress or dislike in the partner, one could modify her behavior to avoid annoying or disappointing the other, therefore improving subsequent interactions.In short, emotional expressions can make the relevant information manifest for joint action.
The establishment and maintenance of commitments through communicative strategies, as well as the expression of certain shared emotions, is a potentially important tool for the design of social robotic agents.Insofar as they serve to make transparent and mutually manifest intentions to carry out a collective action, commitments and shared emotions have the capacity to reduce so-called motivational uncertainty, i.e., the lack of confidence or certainty that the other party will carry out the collective action.In the context of HRI, the communication of commitments and shared emotions could help us overcome two fundamental obstacles.First, during the initiation of the joint action, the use of these strategies could help neutralize users' possible reluctance to deal with robots, precisely because the display of a certain commitment from the robot might motivate the user to interact with it.Secondly, during the achievement of the action, the use of expressions and cues that help to maintain motivation is important in HRI given the opacity with which robots sometimes operate and the limited experience of users in dealing with robots.In fact, the use of emotional expressions to attract or maintain the users' attention have already been studied in some laboratories, for instance, using a sad face to inform users of a failure (Reyes et al. 2016).

Concluding remarks
In the previous section, we have reviewed some recent literature in philosophy and psychology that highlights the triple function of communication in the context of joint action: from mutual recognition between individuals to the expression of commitment and social expectations, communicative cues can facilitate coordination, prediction, and motivation at different steps of the joint action.All the notions described above suggest that we can approach HRI through an integrated perspective of robotic communication in order to progress toward more robust social robots, able to engage in collaborative activities.Currently, while social robotics has dedicated a big effort to equip robots with specific capabilities, for example for recognizing human gestures or sending signals to make their actions more transparent, this expertise does not always translate into more fluid or more efficient interactions.This might be an effect of ever-increasing specialization that sometimes prevents researchers from considering the joint action as a whole, through the analysis of several communicative constraints and expectations both when initiating and when performing the action.For instance, the lack of communicative strategies dedicated to establishing mutual recognition before joint action may trigger previous unrealistic expectations of the human, which can undermine prediction during the whole process.
These difficulties can be overcome by adopting an integrated approach of HRI, i.e., an approach that provides us with a way to avoid compartmentalization (Belhassein et al., 2020;Menezes et al., 2014;Seibt et al., 2020;Young et al., 2011).Once we regard communication as a way to add up or modify the shared pool of relevant information in a given situation, every communicative device, rather than working in isolation to confront specific problems, can indirectly influence the information that impacts joint action as a whole.To establish mutual recognition between partners -a necessary prerequisite for initiating joint action-the robotic agent can for example restrict the type of shared information through scripts or social norms, making communicative signals more transparent.Verbal or gestural devices must therefore be oriented to make mutually manifest the different pieces of information that are relevant in the context of joint action regarding the partner's goals, plans, or beliefs.In other words, these devices can serve to constrain or add up information that facilitates the reduction of different types of uncertainties.
Such an integrative approach may also offer further flexibility: the design of communicative strategies can allow the robot to adapt to different contexts, in which the roles assumed and the degree of common grounds shared with humans can vary.For example, the robot may have to interact differently with the human if the latter is a familiar partner or a stranger.Finally, this integration may provide some continuity to the interaction, which is particularly useful to determine the meanings of communicative signals.For example, recognizing a sign of frustration in a difficult collaborative context allows the agent to know that the partner has a problem.However, without a clear representation of the common ground to situate the specific action or sub-goal being carried out, the adjustment becomes more difficult.It can be even more challenging in situations where one agent detects a failure or an error that needs to be repaired, for instance, if the partner performs a wrong course of action.The continuity and stability of communication thus play a fundamental role not only in achieving objectives but also in motivating the agents to do so.
The present work aimed at suggesting some exploratory ideas to improve prediction and boost motivation in the context of joint action between humans and robots.The current state of the art in social robotics shows that some key ingredients are already available as a basis for developing communicative strategies, especially by focusing on human-robot contingencies (i.e., changes in an agent's behavior in direct response to a signal from another agent) as an indicator of the commitment in the interaction (Lee et al., 2011).A well-studied example is eye-gaze signaling, which has been proven to be an important source of understanding in HRI (Admoni & Scassellati, 2017;Kirchner et al., 2011;Moon et al., 2014;Staudte & Crocker, 2009).For instance, Breazeal et al. (2005) have demonstrated that equipping robots with subtle eye gazing signals -for instance, enabling Leonardo4 to reestablish eye contact with the human when it finishes its turn, and then, communicating that it is ready to proceed to the next step in the task-improves the partners' understanding and allows them to quickly anticipate and address potential errors in the task.Similarly, human performance during a cooperative task with a robot improves significantly when people can follow the gaze of the robot (Boucher et al., 2012) or when the latter produces deictic gestures combined with directed gaze (Häring et al., 2012).Moreover, in ambiguous handover situations, people tend to comply with the direction of the robot's gaze (Admoni et al., 2014).
In addition, researchers are also seeking to design robots with perspective-taking skills to collaborate more efficiently (Trafton et al., 2005).During a human-robot joint activity, it can indeed be necessary for a robot to reason about the human mental states and knowledge of the situation to adapt its behavior (Scassellati, 2002) and make decisions (Görür et al., 2017).Devin and Alami (2016) developed a framework allowing this robotic Theory of Mind (ToM): the system they presented takes into account the human knowledge about the collaborative task in order to identify contingencies and provide her with the needed and only the needed information to reach the common goal.
The contribution of different disciplinary perspectives, in particular through discussions with psychology and philosophy, may therefore represent an important milestone to maximize current results in social robotics and improve the competence of the robot to collaborate with humans in different social environments.Focusing closely on communication between humans and robots during joint action could finally allow researchers to widen the application contexts (e.g.tasks, environments) of human-robot collaboration systems, possibly echoing Thomaz et al. (2016), for whom "implementing a collaborative planner in a complex realistic setting would be an apt grand challenge for the human-robot collaboration community".

Declaration of competing interest
We hereby declare that there are none potential conflicts of interest that could bias the evaluation or results of our research.

used two cameras and two laser range finders to detect human gazes or pointing gestures, while Anjum et al. (2014) used a Microsoft Kinect camera and Support Vector Machine to recognize eight
with an industrial robot arm; Oberman et al., 2007 with a mechanical five-fingered robot arm; Saby et al., 2011 with a hand puppet bear).By reproducing the study of Kilner