12 Studying Human-Human interaction to build the future of Human-Robot interaction

: Understanding human-to-human sensorimotor interaction, in a way that can be predicted and controlled, is probably one of the greatest challenges for the future of Human-Computer Confluence (HCC). This would allow, for example, the possibility of optimizing group decision-making or brain storming efficacy. On the other hand it would also offer the means to naturally introduce artificial embodied systems into our social landscape. This vision sees robots or software that smoothly interface with our social representations and adapt dynamically to social contexts. The path to such vision requires at least three components. The first, driven by cognitive neuroscience, has to develop methods to measure the real-time information flow between interacting participants – in ecological scenarios. The second, shaped by the Human-Robot Interaction (HRI) field, consists in building the proper information flow between robots and humans. Finally, the third will have to see the convergence of robotics, neuroscience and psychology in order to functionally evaluate the reality of a long-term HCC.


Sensorimotor Communication
Understanding verbal and non-verbal communication in humans is a rather easy task we master in childhood. At the same time, it is true that communication is a rather vague concept, since there are too many unknown degrees of freedom or the same measurable configuration can change in a manner we cannot comprehend, predict nor control. However, the chemistry of group interaction e.g. the subjective feeling of being "in sync" with other people -is something we can perceive easily. We do that instinctively because we're innately social creatures that by definition send and receive (implicit) socially relevant messages in all our interactions (i.e. hand gestures, facial expressions, etc.). We do that in real time in an extremely adaptive manner, at no cost.
Such a capability may be supported by the complex sensory-motor properties present in the human motor system. In fact, in addition to the clear role in movement planning and execution, some premotor neurons have activity that could be interpreted as complex visual responses. For example, "mirror neurons" discharge both when the monkey executes an action and observes another individual making the same action in front of it (Gallese et al., 1996). This discovery has stimulated basic research on how, why and when we do send and receive explicit and implicit sensorimotor messages during social interaction. Along these lines, human research showed that hand muscles cortico-spinal excitability, tested via Transcranial Magnetic Stimulation (TMS), is modulated during the passive observation of hand actions (Fadiga, et al., 1995). Similarly, it was also shown that passive listening to speech, requiring tongue mobilization, modulated the tongue cortico-bulbar excitability (Fadiga et al., 2002). In both cases, the observer/listener shares similar motor programs with the actor/speaker. Therefore, the recognition of other's action/speech may exploit knowledge on how to produce that particular stimulus.

Computational Advantages of Sensorimotor Communication
This computational general principle has supported a major shift in cognitive systems research. In fact, the human motor system was once believed to be mainly an output system. However, the motor brain could also play a role in perceptual and cognitive functions. This challenges the classical sensory versus motor separation (Young, 1970) and opens the doors to embodied cognition and robotics research (Clark, Grush 1999). However, automated systems cannot reach human-like performance when dealing with real coding/decoding of these signals. This simple fact forces us to start exploiting human brain/body solutions. All attempts that do not take this fact into account are bound to be unreliable in variable environments, to fail in the generalization to new examples and to be unable to scale up to solve more complex problems.
These considerations in HCC are pivotal to humanoid robots, in which the explicit design of morphological and functional body features must be complemented with a human-like cognition in order to elicit a human-like interaction and communication (Ferscha, 2013). More importantly, such innate human "behavioural-coupling" capability has to interact with situational intervening factors, which can dramatically hinder a successful information flow. For example, in mediated communications, many relevant cues are often filtered-out because of particular constraints of the medium. As a consequence, things are even more complicated when designing the communication with artefacts (i.e. robots). Modelling and implementation of these automatic systems imposes the additional new challenge of adaptability to context. Therefore, we suggest that the natural human-to-human coordination capability is the only guide in the development of the future human-to-robot and humancomputer interaction in general. In the following sections we will describe the most updated efforts in measuring sensorimotor human-to-human interaction. Next we will overview the strategies to build robot-to-human interaction and finally we will stress the need to quantify the efficacy of the human-to-robot interaction.

Human-to-Human Interaction
Sensorimotor communication forms the basis of unmediated communication in animals and humans too (Rands et al., 2003;Couzin, et al., 2005;Nagy et al., 2010). Complex coordinated behaviour between multiple individuals can arise without the need for verbal communication to happen (Sebanz et al., 2010;Neda et al., 2000). One important aspect of this kind of communication is the absence of any symbolic component, making such information flow automatic and implicit in nature. Behavioural research has showed an implicit relationship between the stability of intrapersonal coordination and the emergence of spontaneous interpersonal coordination (Coey et al., 2011). Furthermore, individual differences on synchronization performance plays a significant role, suggesting that temporal prediction ability may potentially mediate the interaction of cognitive, motor, and social processes underlying joint action (Pecenka, Keller, 2011). In general, behavioural coordination results from establishing interpersonal synergies. Interpersonal synergies are higher-order control systems formed by coupling movement system degrees of freedom of two (or more) participants. Characteristic features of synergies identified in studies of intrapersonal coordination are revealed in studies of interpersonal coordination in interactive tasks (Riley et al., 2011).
Interestingly, a growing body of neuroscientific evidence indicates that such interpersonal coordination or "group mind" phenomena occurring during interactive tasks are mediated by synchronized cortical activity (Hasson et al., 2012;Loehr et al., 2013;Schippers et al., 2010;Lindenberger et al., 2009). However, classical approaches in social neuroscience, and the field of hyper-scanning, typically search for the significant changes in brain activities after specific training that is supposed to augment coordination (Yun et al., 2012) or during the engagement of a rather rigid/constrained social interaction in general (Hasson et al., 2012;Riley et al., 2011). Therefore, it is important to note that typical behavioural and neuroimaging experiments, rarely implement the ecological complexity of natural interaction. Rather, for control purposes, they devise forced turn taking or a constrained communication mode.

Ecological Measurement of Human-to-Human Information Flow
However, recent studies have approached the problem from a radically different perspective (D'Ausilio et al., 2012;Badino et al., 2014). These studies have indeed measured unobtrusive motion kinematics from "real" groups embedded in "real" social interaction and extracting continuous information flow from participants. This was made possible by recording the motion kinematics of ensemble musicians, and then the use of computational methods allowed the extraction of information flow between participants. Ensemble musicians are experts in non-verbal interaction and behave like processing units embedded within a complex system. Each unit possesses the capability to transmit sensory information non-verbally, and to decode other's movement potentially via the mirror matching system. As these two flows of information occur simultaneously, each unit, and the system as a whole, must rely heavily on predictive models. Thus, the musical ensemble behaves like a complex dynamical system having, however, important constraints that turn into benefits from an experimental perspective.
The quantification of inter-individual information transfer has rarely been attempted in the context of ecological and complex interaction scenarios. To this end, violinists' and conductors' movement kinematics was recorded during the execution of Mozart pieces, searching for causal relationships among musicians by using the Granger Causality method (GC). It was shown that the increase of conductor-tomusicians influence, together with the reduction of musician-to-musician coordination (an index of successful leadership) goes in parallel with quality of execution, as assessed by musical experts' judgments. This study shows that the analysis of motor behaviour provides a potentially interesting tool to approach the rather intangible concept of aesthetic quality of music and visual communication efficacy (D'Ausilio et al., 2012).
The subsequent work found a clear positive relationship between the amount of communication and complexity of the musical score segment. Furthermore, temporal and dynamical changes applied to the musical score were devised in order to force unidirectional communication between the leader of the quartet and the other participants. Results show that in these situations, unidirectional influence from the leader decreased, thus implying that effective leadership may require prior sharing of information between participants. In conclusion, it was possible to measure the amount of information flow and sensorimotor group dynamics suggesting that the fabric of leadership is not built upon exclusive information knowledge but rather on sharing it (Badino et al., 2014).
These studies suggest that, with minimal invasiveness and during real interaction, we can possibly measure the information flow between two (or more) human participants. The next step we suggest is to build robots that are capable of eliciting a realistic robot-to-human interaction. In this sense the goal is to foster a dynamical pattern of information flow between natural and artificial agents.

Robot-to-Human Interaction
It is usually predicted that the inclusion of robots in our society will progressively become more widespread. However, one of the biggest obstacles to a pervasive use of robots supporting and helping humans in their everyday chores relies on the absence of an intuitive communication between robotic devices and non-expert users. Several attempts have been made at achieving seamless human-robot interaction (e.g., Sisbot and Alami 2012;Dragan, Lee et al. 2013) even with positive outcomes in the context of the small manufacturing industries (e.g., the manufacturing robot Baxter). However the lack of a systematic understanding of what works and why, does not allow for a generalization of this success in different domains. Therefore, in order for robotics, and in particular humanoid robotics, to become a common and functional element of our society, a deeper comprehension of the principles of human-human interaction is needed. Only this knowledge will pave the way to the design of robotic platforms easily usable and understandable by everybody.
However, the investigation of social interaction is a very challenging task. The dynamics of two agents performing a joint action together is much more complex than just the sum of the behaviours of the two individuals. The actions, the movements and even the perceptual strategies each partner chooses are substantially modified and adapted to the cooperation. Traditional research on this topic has been conducted analysing a posteriori recordings of interactions, with the disadvantage of not being able to intervene or to selectively modulate the behaviour of the interacting partners. In more constrained scenarios, a human actor has been used as stimulus. Although this approach provides an increased level of control, not all aspects of human behaviour can be actually manipulated. In particular, the automatic behaviours that constitute a great part of natural coordination are very difficult to restrain. As a potential solution, the option of using video recordings as stimuli for an interaction has been often adopted, especially in the context of action observation and anticipation. This approach guarantees more control and a perfect repeatability, but on the other hand it eliminates some fundamental aspects of real collaborative scenarios, as the shared space of actions, the physical presence, the possibility to interact with the same objects and even the potential physical contact between the two partners.
A valuable, novel solution to these problems could be represented by robots and in particular by humanoid robots. These are embodied agents, moving in our physical world and therefore sharing the same physical space and being subject to the same physical laws that influence human behaviour. Robots with a humanoid shape have the additional advantage of being able to use the tools and objects that have been designed for human use, making them more adaptable to our common environments. Moreover, the human shape and the way humans move are encoded by the brain differently with respect to any other kind of shape and motion (Fabbri-Destro and Rizzolatti 2008). Consequently, humanoid platforms can probe some of the internal models already developed to interact with people and allow studying exactly those basic mechanisms that make human-human interaction fluid and efficient. Thus, humanoid robots represent an ideal stimulator, i.e. a "physical collaborator" whose behaviour could be controlled in a repeatable way. Not only do they share the partner's action space and afford physical contact, but they can also monitor in real-time the performance of their partner through their on-board sensors and respond appropriately enabling the investigation of longer and more structured forms of interaction.

Two Examples on How to Build Robot-to-Human Information Flow
We suggest that this kind of technology is particularly suited to investigate the very basics mechanisms of interaction: how the motor and sensory models of action and perception change when an action is performed in collaboration rather than in solo, or which are the specific properties of motion that are most relevant in allowing an immediate comprehension between co-operators (Sciutti, Bisio et al. 2012). In this sense the focus is on implicit communication, one specific aspect of social interaction. A classic example of implicit communication is represented by gaze movements. We unconsciously move our eyes, fixating objects of interest or landmarks where our actions will be directed to (Flanagan and Johansson 2003). Human beings are extremely sensitive to the direction of others' gaze. For instance we follow someone else's gaze to recognize and share the focus of his attention and by looking at someone else's eyes we can infer if he is paying attention to what we are showing to him (Lohan, Griffiths et al. 2014). Moreover, we can often anticipate the intention of our partners by noticing what he is fixating, or even infer whether he is thinking or paying attention to us by observing the way he moves his eyes around.
To clarify the effect of this implicit signal on interaction, during a turn-taking game, a series of experiments manipulated the way in which the humanoid robot iCub moved its eyes. The robot either looked in the direction of the partner after its turn finished or kept its eyes fixed. We assessed whether the way participants played the game was influenced by this difference in gazing behaviour and by a different degree of robot autonomy. Interestingly, even if robot gazing was not relevant for the game play, participants modified their playing strategy, apparently attributing more relevance to robot actions when it exhibited an interactive gazing behaviour and more autonomy (Sciutti, Del Prete et al. 2013). Hence, a simple manipulation of robot's gaze motion has an actual impact on how humans behave in an interaction.
Not only eyes however, are carriers of unconscious communication. Indeed, even our common movements provide a quantity of information that it is implicitly read by human observers. When we are reaching to grasp a cup, someone looking at us can often anticipate which cup, among the ones on the table, we are going to take and whether we will drink from it or we will store it away. Moreover, when we transport the cup, our motion tells the observer whether it is full or empty or whether it is too hot to be handled. Understanding where all this information is encoded could potentially allow simplifying the design of robot shape and motion, by keeping at the same time the efficiency of a rich implicit communication.
To this aim the parameters of a lifting action were evaluated to verify whether they could convey enough information to the observer. This information might offer a cue not only to infer the weight that the agent is carrying, but also to be prepared to appropriately perform afterwards an action on the same object. On the humanoid robot iCub the velocity profile of the actions were changed to assess under which conditions robot observation was as communicative as human observation. Interestingly, even a simplified robotic movement, sharing with human actions only an approximation of the relationship between lifting velocity and object weight, was enough to guarantee both to adults (Sciutti, Patanè et al. 2014) and older children (10 years old, Sciutti, Patanè et al. 2014) a correct understanding of the load, with a performance comparable to that measured for the observation of human lifters.
Therefore, even a shape that is not exactly human-like and a motion which is a simplified version of that adopted by humans is enough to allow for a rich transfer of information through action observation. Hence, a very basic form of social intelligence, as the efficient transmission of implicit information through action execution (by gaze or by arm motion) can be achieved on humanoid robots, even with a strong simplification on the human-likeness of the robot shape and motion. We propose therefore that humanoid robots, before becoming companions or helpers, might play the fundamental role of interactive probes, i.e., of tools to derive in naturalistic contexts which human-like properties are actually relevant to foster a natural and seamless interaction. This process has two important consequences: on the one hand it allows shedding some light on the mechanisms of social interaction in humans, a complex topic which still requires the research of many neuroscientists and psychologists. On the other hand, it provides design indications, which could result in simpler and cheaper platforms, at the same time exhibiting a perfect and natural interface to their human users.

Evaluating Human-to-Robot Interaction
Humanoid robots are no humans, but their appearance and functionality sometime leans towards a human-like appearance and behaviour. In fact, here we suggest that replicating human-like features is useful only if these are central to the development of a natural interaction with humans. This functional stance on robotic design must be derived from basic research with the aim of removing superfluous computational and architectural costs, in a principled manner. The principle of replicating only the minimal human shape and motion features necessary to enable a natural interaction will have also the additional advantage of lowering the risk of entering in the Uncanny valley of eeriness (Mori 1970) in the attempt of mimicking the human being as a whole.
By following the principle of replicating only the minimal features needed, robots still have limitations. In fact, the ultimate minimalistic approach descends from a functional perspective and thus the human-likeness has to be judged by the users in a real/long term interaction and thus coping with the robots' limitations and potentialities.

Short term Human-to-Robot Interaction
Current research suggest that humans not only accept robots in social environments better when their behaviour and appearance is human-like but it also appreciates the fact that humans will be able to work with a natural interaction interface (like a humanoid robot) on a very efficient level (Sato, Yamaguchi, Harashima, 2007). To create such a natural interaction interface with a robot the appearance is supportive, but the promises given by the appearance must be kept by the robots' capabilities (Weiss, et al. 2008).
Keeping the balance between a human-like appearance and the functionality of the robot is a key point to keep in mind. In fact, the primary goal of social robotics should be that of building functionally effective and coherent implementations of behaviours on any given robotic platform. Thus, it is important to not only base a behaviour strategy for a robot on a human-based model, but also to understand the limitations of the robots capabilities and the discrepancies that have to be taken into account. Therefore, it is like saying that we need to be in the robot's shoes or look from the robots perspective.
By looking from the robots perspective, the implementation of models based on human behaviour need to take a second thought. In this second step we are not only evaluating the model implemented on the robot, but also the resulting behaviour of the robot and effects on humans. A method to evaluate the functional results of the implemented interaction is to evaluate the loop between human and robot. Hence, the idea is to evaluate the interplay presented in the interaction of human and robot (Lohan et al. 2012).
Therefore, evaluating human-to-robot interactions should be seen as a two-step process from the very beginning. First, the model based on human social behaviour is implemented on an appropriate robotic platform. This model has to be validated based on its given benchmarks and the given hypothesis to tested. Secondly, the resulting interface between human and robot in interaction may give a higher-level of insight into the capacity to establish a social connection, presented by the robot. However, even in a constrained scenario, the quantification of the effective establishment of the modelled interaction is not trivial. In this critical second step, measurements of human-human interaction could potentially be used as the ground truth to compare with the human-to-robot one.
At the same time, also methods derived from behavioural research have been recently employed. Among them, the methodology of conversation analysis is one possibility, which has been used in human-human interaction to evaluate this interplay (Hutchby and Wooffitt, 2008). Furthermore, there are other useful concepts like the measurement of contingency (Csibra and Gergely, 2009;Lohan et al., 2011) or synchronization (Kose-Bagci, et al. 2010), between interaction partners that can potentially hint towards the interaction quality. These methods are taking both interaction partners and therefore both sides into account. At the same time, it is also true that different social contextual conditions can change the meaning of interaction, not only based on the interaction partner's characteristics (i.e. when greeting someone close to you in a public place different social rules apply than when greeting the same person in a private space).

Long Term Human-to-Robot Interaction
When moving towards the direction of social interaction and long term or even lifelong relationships with a robot we also need to understand the long-term dynamics behind these interactions. Hence, the robot needs to be able to adapt and act within its capability on the side of the human partner. In human-human interaction small changes in the sensorimotor communication can have a drastic impact on their behaviour, therefore the sensitivity and understanding of these small cues must be important for a robot. At the same, human behaviour has a broad and hierarchical variation in the communication complexity. For such a reason, those subtle cues sent by humans are not always central to the communicative interaction, and thus the robot needs to learn all the possible variations in a given context. Therefore, the robot will have to select the saliency of very different kinds of features in order to respond appropriately in accordance to the given social situation.
As a matter of fact, the embodiment of a system like a robot in social situations is defined as not only being dependent on its own sensory-motor experiences and capabilities but also dependent on the environmental changes, caused by social constraints (Dautenhahn, 1999). Thus, when moving robots into social environments they need to be able to take their surroundings and the rules given by these surroundings into account. This is why looking from a robots perspective, the interplay between its behaviour and the behaviour of its interaction partner, needs to be considered carefully.
When concentrating on a long-term perspective of social interaction, the evaluation of a robot that can create a relation with a human is a very complex problem, still requiring a credible solution. When looking towards methodologies used in developmental psychology we can see that it is difficult to create a quantitative strategy to evaluate the long-term evolution of the interactions. Current state of the art robotics is facing exactly these problems with the evaluation of long-term interaction. Therefore, models like social symbioses or emotional states are explored in creating strategies to give a robot the capability of dynamic adaptation (Rosenthal et al., 2010).
Overall, evaluating human-robot interaction has different levels of complexity. When evaluating human-to-robot interaction, we need to take also the robots perspective and therefore its capabilities into account to take a look at the loop created in the interaction. Furthermore, social rules created by environmental constraints and therefore the full embodiment of a social interaction, needs to be taken into account to functionally evaluate the success of the robotic design appropriately.

Conclusion
In conclusion here we are proposing the need for the field of HRI, to move from a "rigid" social contact between (social) bodies towards a "soft" interaction. By "soft" interaction we mean the dynamical compliance and long-term adaptability to human sensorimotor, cognitive, affective non-physical communication and interaction. Speaking in terms of control/planning, robotics is already moving from avoiding contacts with the environment to exploiting them, i.e., using them in motion control, thanks to the new compliant actuations and the sensors (Tsagarakis et al., 2010;Ferscha, 2013).
Along these lines, the field of HRI may be stimulated by current attempts to measure the real-time implicit information flow between human agents, embedded in a complex and ecological scenario. In fact, basic research in cognitive neuroscience we outlined earlier, may serve two critical functions. The first regards the principled building of a functionally effective human-robot interaction. The second technical advantage is that the same methods and models used to measure human-human flow can be applied to the future HRI implementations, as a benchmark to evaluate successful interaction with robots.
This means that the robot is not anymore studied in separation from its physical human-centred environment. Conversely, robot and environment (or other agents) are blended together to plan a new optimal and collaborative way to move. Similarly, in social cognition, both for human-human and human-robot interaction, we feel the need for a change: from the study of individual in isolation, to the study of complex systems of two or more people interacting together. Most importantly, these latter needs not to be treated as a linear sum of single individualities, but require again a "blending" which is manifested by the subtle mechanisms of implicit communication and which is modulated by the context and the long term interaction.
In general, we propose an integrated approach (as sketched in Figure 12.1) that starts with methods to quantify human-human interaction. The knowledge derived for this basic research is then translated into basic principles to build better robots. Finally, and by closing the conceptual loop, it uses those very same methods to functionally evaluate the efficacy of human-robot interaction. Figure 12.1: This graphical depiction represents the need to rigorously quantify human-to-human interaction for two main purposes. The first is to derive useful principles to guide the implementation of robot-to-human interaction. The second is that such artificially built interaction needs to be evaluated against its natural benchmark. Closing this conceptual loop, in our opinion, is the only way to establish an effective HCC with an embodied artefact.