Neuroscience and Biobehavioral Reviews Role of the social actor during social interaction and learning in human-monkey paradigms

The social interactions between primates is drawn by their ability to predict others ’ behaviours, to learn from others ’ actions and to represent others ’ intentions. It allows them to extract information by observation to understand which action is leading to which outcome and to maximize the e ﬃ ciency of their own future beha- viours. These processes have mainly been investigated studying non-human primates observing conspeci ﬁ cs, but more recently an increasing body of work has adopted a human-monkey paradigm, and some have now con- vincingly shown that macaque monkeys understand human choices, consider them and can act accordingly. Two main hypotheses have been developed to explain macaque monkeys ’ ability to learn from humans: 1) the si- milarity between the behaviours of both species 2) the presence of a non-ambiguous link between the observed action and its outcome. Based on the literature examined the recent evidence appears to supports the second. The non-social observational learning, meaning the learning by observation of an inanimate agent, can be a powerful tool to understand the mechanisms underlying the social interactions.


Introduction
Over the last century, the term 'social learning' has received numerous definitions and encompassed several social behaviours. The pioneering dissertations of Romanes (1884), Morgan (1900) and Thorndike (1898Thorndike ( , 1911 built the history of social learning. In the Animal Intelligence of Edward Thorndike (1911) were proposed two definitions of imitation based on which clues the observer uses to reproduce the demonstrator behaviour. One defined the imitation referring to the ability to learn from the results of the demonstrator's actions and the other to learn from the demonstrator's actions themselves. Over years, the word imitation has been associated to diverse social behaviours and lost a part of its substance, encouraging the introduction of the generic 'social learning' term by Box (1984). Today, the term social learning refers to 'learning that is influenced by observation of, or interaction with, another (typically a conspecific) or its products' (Box, 1984;Galef, 1988;Heyes, 1994). This definition was largely re-used to define how animals (observers) can extract information from other animals (demonstrators/actors). In this review, we carry out a brief review of the social learning in macaque monkeys before focussing on the humanmonkey paradigm and its relevancy to study social interactions. We then discuss the role of the vicarious aspect and the mere presence of the reward in social learning behaviours and hypothesize on its role in social learning processes in comparison to non-social learning that can be studied in the so-called 'ghost display' condition. Finally, we conclude by considering the possibility offered by the human-monkey paradigm and the non-social learning for the study of the neural bases of social behaviours.

Interaction with a conspecific
The very first studies on social learning in macaque monkeys used an apparatus made of two restraining cages and a sliding test-tray between them (Darby and Riopelle, 1959;Riopelle, 1960). Two objects, one rewarded and the other not, were placed on the tray. A demonstrator monkey attempted to find the rewarded object, based on its identity (Darby and Riopelle, 1959) or location (Riopelle, 1960). The second monkey, the observer, witnessed the choice of the demonstrator and later could make its own choice between the two objects. The authors demonstrated that the observer monkey was able to learn which was the rewarded object, particularly after the observation of the error of the demonstrator monkey. These findings provided the initial evidence that macaque monkeys could benefit from the observation of a conspecific performing an action, learning which choice led to the reward. Further evidence of the macaque observational learning ability was provided by Myers (1970) in a study that ruled out the effect of social facilitation on learning in stumptail and rhesus monkeys. Myers observed an increase of the learning rates and a decrease of the latencies on the acquisition in a multiple schedule task after observation. Importantly, in this study, the observers could neither perform the response nor receive reinforcement while the demonstrators were showing the pattern of behaviour to be learned and were tested alone later. Among the important factors for the observational learning, a key role in the extraction of information from others' actions has been assigned to the ability of macaque monkeys, as for other species of nonhuman primates, to follow the visual gaze of conspecifics (Emery et al., 1997;Tomasello et al., 1998;Goossens et al., 2012) even if it represents a space outside their own visual field (Goossens et al., 2012). Moreover, macaques seem to be capable of following gaze in order to take the visual perspective of a conspecifics (Canteloup et al., 2016), which has been defined as one of the key components necessary to be able to represent and assess mental states of others, a cognitive ability known as Theory of Mind (Meunier, 2017). More recently, three studies have demonstrated the social transmission of knowledge and the ability of monkeys, macaques and capuchins, to glean information from the observation of a conspecific performing different kinds of task (Brosnan and de Waal, 2004;Subiaul et al., 2004;Meunier et al., 2007). In the first study, Brosnan and de Waal (2004) tested the ability of capuchin monkeys to learn a token's value by witnessing a conspecific partner token exchange. The exchange was defined as an action of giving an uneatable token to the experimenter and receiving food in return. The authors showed that, based on information acquired by watching a conspecific receiving different rewards during several exchanges, capuchin monkeys were able to acquire and even change the preference for tokens. In the second study, Subiaul et al. (2004) investigated cognitive imitation in rhesus monkeys using a simultaneous chains task, designed to rule out the possibility of an imitation of the actions of the other animal. The animals were trained to respond in a prescribed order to a set of pictures displayed on a touchscreen. In the social learning condition, the 'student' monkey could observe an 'expert' monkey through a plexiglass panel separating both testing chambers. The 'student' monkeys benefited from such observation and were able to learn vicariously some of the list items after previously monitoring the choice of the expert. By separating cognitive rules from motor actions, the imitation process studied by Subiaul et al. (2004) was clearly not simply motor, demonstrating for the first time the ability of cognitive imitation in macaque monkeys. In the third study, confirming the former results of Darby and Riopelle (1959), Meunier et al. (2007) demonstrated that rhesus monkeys learned novel lists of objects faster after the observation of a demonstrator conspecific than alone by trial-and-error ( Fig. 1, modified from Meunier et al., 2007). The authors reported that other members of a monkey group outside of a laboratory setting spontaneously observed when a conspecific was tested nearby and not simply sporadically as reported previously (Custance et al., 2006). Indeed, contrary to Custance et al. (2006), who questioned the possibility of social learning in macaques, Meunier et al. (2007) provided evidence that they could learn new habits by observation of conspecifics when using the appropriate paradigm. The contribution of the work of Meunier et al. (2007) was to show that non-deprived macaque monkeys can apply abstract rules acquired by observation also in a semi-natural setting and in some cases also immediately after sporadic glances to the demonstrator.
Likewise, the observation of a conspecific is involved in the acquisition of many primates' fear and phobias. The body of work developed by Mineka and colleagues demonstrated that a vicarious classical fear conditioning can drive the establishment of a persistent fear. They showed that rhesus monkeys who observed conspecifics behaving fearfully in the presence of real or fake snakes developed a persistent fear of snakes (Mineka et al., 1984;Cook et al., 1985). Moreover, extensive prior exposure to non-fearful monkey behaving non-fearfully with snakes suppressed this behaviour. This 'immunization' resulted in similar responses to snake and non-snake stimuli after the prior exposure (Mineka and Cook, 1986). The authors, in a further study, tested the reactions of observer monkeys to a conspecific reacting fearfully to snakes or to neutral objects (Mineka and Cook, 1993; Experiment 1). The observer's disturbance behaviours, evaluated through 12 different measures which included for example, fear withdrawal, eye aversion or threat, during this observation phase were highly correlated with the model monkey's disturbance behaviours. Thus, as for the observation of a behaviour leading to a positive outcome, social learning extends also to the significance of negative stimuli.

Interaction with a non-conspecific human agent
In the previous section we presented evidence that macaque monkeys are able to monitor and learn from the behaviour of their conspecifics. We ask now whether monkeys are also able to learn from other animals in particular from other primate species, such as humans. The first studies, on very simple behaviours, offered contradictory results on the possible use of interactive paradigms involving monkeys and humans. Taking as an example the ability to follow the gaze of others, Anderson and colleagues observed that capuchin (Anderson et al., 1995) and rhesus monkeys (Anderson et al., 1996) were unable to use the position of the experimenter's gazing on the rewarded object as a cue to make their choice. Using head and eye cues, monkeys are able to follow the human gaze (Ferrari et al., 2002) and to co-orient visually with humans (Anderson and Mitchell, 1999), but failed to extract useful Fig. 1. Performance of 4 monkeys over 12 lists of 10 object-reward associations learned through individual learning by trial-and-error (T&E) and over 12 lists learned through social learning after having had the opportunity to observe a conspecific's individual learning session (LeO). The reported scores are the numbers of errors (means ± SEM) to reach a defined criterion (9/10 correct responses). All four monkeys benefited from observation in the same way with a mean decrease of 39% of errors to reach criterion. These results show the ability of macaque monkeys to extract information and benefit from the observation of conspecifics. Modified from Meunier et al. (2007).
information about what they see and know (Anderson et al., 1996). Itakura et al. (1996) reached the same conclusion, observing that when no pointing was combined with the gaze, only 1 of the 40 non-human primates tested, including great apes, responded to or followed the direction of the human gaze. In the same way, macaque monkeys that observed a human opening different puzzle boxes (Rigamonti et al., 2005) or executing tool-use actions (Fattori et al., 2000) showed only very weak evidence of socially mediated learning. However, adopting a more naturalistic paradigm in the context of the free-ranging rhesus macaques' colony of Puerto Rico (Rawlins and Kessler, 1986), Flombaum and Santos (2005) reported the first evidence that rhesus macaques can not only follow the gaze of the experimenter but also reason about their visual perception. They suggested that macaque monkeys are able to extract information about the direction of a human's gaze and to extrapolate what they can or cannot see. In their experiment, they offered the opportunity to the subject to take a grape from two human 'competitors'. In all six declinations of their experiment, one human was looking at the grape and the other looked elsewhere or had the eyes covered. In most of the case, monkeys approached the human whose gaze was directed away from the food item, to take the grape without being detected. Monkeys used the human's gaze information to make task-relevant decisions. The authors hypothesized that their positive results depended on how well their paradigm matched the natural conditions in which monkeys use visual perception. These results were further confirmed by a series of studies which investigated the ability of different species of macaque to discriminate among several attentional states of a human agent, in a more complex task in which they had to point to the reward location to the experimenter in order to receive it (Canteloup et al., 2015a,b).
Based on the results of these previous works, Genovesio and colleagues developed a human-monkey interaction paradigm in which monkeys and humans interacted switching turns in a common workspace represented by a touch screen (Falcone et al., 2012a). In the Non-Match-To-Goal (NMTG) task, different objects were presented in couples on the touch screen. The task rule required the macaque monkeys to switch from their choice on the previous trial to a different one ( Fig. 2; modified from Falcone et al., 2012a). Performing the task alone, the monkeys were able to understand the task's rule that is to reject the object previously chosen and to select the other one (noninteractive trials). In a subset of trials, a human partner located beside the monkey in front of the touchscreen performed the task. The human performed one or more consecutive trials and when he concluded his last trial of a sequence, the monkeys were required to follow the same rule, which is to discard the object the human chose in the previous trial and to select the other one (interactive trials). In this experiment, the human partner always performed correctly, and the monkeys obtained the reward after both type of correct trials, performed by themselves or by the human. The task was not designed to study the observational learning in monkeys but to test a more basic ability to interact with humans that is to observe and to consider the human's choices to perform correctly in interactive trials. The experiment illustrated that in addition to the monitoring of their own choices, the monkeys were able to shift easily between observer and actor roles, to monitor the human partner choices and to perform the task accordingly. They showed good performance in both noninteractive and interactive conditions, respectively after a self-acquired goal and after a human partner-acquired goal. This work brought clear evidence regarding monkeys' ability to interact socially with humans and promoted new learning studies with humans as the model because the ability to interact could be considered a prerequisite for being able to learn from humans. This result was in contrast with the idea that observational learning could take place only from conspecifics advanced by Brosnan and de Walls (2004) and by Meunier et al. (2007).
Later, Falcone et al., 2012b andMonfardini et al. (2014) challenged the notion that observational learning was limited to conspecifics introducing new learning paradigms. While Falcone et al., 2012b emphasized the importance of a vicarious reward, Monfardini et al. (2014) considered the similarity between both species as the critical factor to promote social learning. Monfardini et al. (2014) compared three different conditions of observational learning, one in which monkeys observed a conspecific and two in which they observed a human model. The 'stimulus-enhancing' human model captured the animal's attention and uncovered one of two objects to show to the monkey the presence or the absence of a reward below the object without consuming it if uncovered. Differently, the 'monkey-like' human model mimicked the behaviour of a conspecific and without tempting to capture the monkey's attention, displaced one of the two objects and ate the reward when uncovered. Using this paradigm, the 'stimulus-enhancing' human was detrimental for the following trialand-error learning whereas observing a conspecific or a 'monkey-like' human significantly enhanced the monkeys' performance.
One other aspect to consider in observational learning is the correctness of the demonstrator performance. Most observational learning studies converge upon one conclusion: alone, humans and animals learn more from their own successes than from their own errors. On the contrary, the observation of others' mistakes appears to be more informative than others' correct choices (Templeton, 1998;Kuroshima et al., 2008;Monfardini et al., 2012Monfardini et al., , 2014Isbaine et al., 2015). Monfardini et al. (2012) compared the performance of 14 humans and 6 macaque monkeys to learn which one of two pictures or objects was the rewarded one after a first choice made by chance. Both species learned better from positive than negative outcomes. Otherwise, when monkeys or humans observed first a conspecific's choice, the opposite is observed both species learned better from a negative than a positive outcome in the social condition. In one of their subsequent studies, Monfardini et al. (2014) provided evidence that the observational learning across species, with monkeys observing humans, followed the same pattern. Monkeys observing a 'monkey-like' human model learned more from its error than from its success. In a recent study (Ferrucci et al., 2019), this effect was confirmed. The authors tested the observational learning in the context of the Object-In-Place (OIP) learning task ( Fig. 3A-C, modified from Ferrucci et al., 2019). This task was previously used to study a peculiar form of individual learning, so-called one-trial learning, and the effect of lesions on this behaviour (Gaffan, 1994;Gaffan and Parker, 1996;Charles et al., 2004;Browning et al., 2005;Browning and Gaffan, 2008;Wilson et al., 2010). In this task, the animals exploit a number of clues composing a background scene to learn which one of two objects was the rewarded one. This background scene is essential for one-trial learning (Gaffan et al., 1994) and has the function of a retrieval cue that enhances the animal's performance in the second presentation of the same couple of objects. Ferrucci et al. (2019) used a modified version of the OIP task in which monkeys observed a human partner choosing randomly between two objects in five successive problems. After these five problems, the first run ended, and the second run began with the presentation of the five identical problems in the same order. The monkeys received a reward at the end of correct trials in both versions of the task. Using this paradigm, their findings confirmed the conclusion of previous works: monkeys learned better from human errors than from a human's correct choices (Fig. 3E, modified from Ferrucci et al., 2019). The similarities of monkey's behaviours when they observe a conspecific and when they observe a human partner (i. e. the ability to learn from both species, mainly from their errors) lead to think that the social interaction processes and their neural substrates can be studied using human agents in a humanmonkey interaction paradigm (Falcone et al., 2016(Falcone et al., , 2017Cirillo et al., 2018).

Vicarious versus non-vicarious reward in observational learning
As defined in the Merriam-Webster, the adjective vicarious is related to something 'experienced or realized through imaginative or sympathetic participation in the experience of another'. It corresponds to something 'experienced or felt by watching, hearing about or reading about someone else rather than by doing something yourself'. Following the results of Flombaum and Santos (2005) and their previous study (Falcone et al., 2012a), Falcone et al. (2012b) specifically tested the possibility to learn vicariously from human models through a very simple task. Indeed, the negative results of previous studies in macaque monkeys (Fattori et al., 2000;Meunier et al., 2007) and capuchin monkeys (Brosnan and de Walls, 2004) contrasted with the ability to interact with and monitor humans described by Falcone et al. (2012a) and deserved further study. In these previous studies (Fattori et al., 2000;Brosnan and de Walls, 2004;Meunier et al., 2007), the human models executed different actions such as tool-use actions, a token exchange or an object-discrimination task, but they had all in common that the food was not consumed by the experimenter. Instead, in the study of Falcone et al. (2012b), the human model grasped the positive object, made the reward visible for the monkey and consumed the piece of apple beneath the object during the monkey observation phase. In this latter case, both monkeys used in the study were able to vicariously learn from a human model and to perform above chance level in the subsequent test phase. Therefore, vicarious rewards appeared to be the key factor necessary to promote learning. Monkeys observed a human model performing a behaviour leading to the reward and, importantly, consuming food below the object. As further evidence, Bevacqua et al. (2013) with an experiment similar to the one of Brosnan and de Walls (2004) showed that monkeys can also learn the symbolic meaning of tokens from human models through vicarious reward. They observed human models exchanging both tokens associated with zero value (neutral tokens) and tokens rewarded with a piece of apple. In the latter case, monkeys observed the human model consuming the reward rather than simply watching, as in the experiment of Brosnan and de Walls (2004). During the test phase, monkeys chose more frequently the token associated with the reward vicariously than the neutral token.
Using a monkey-monkey paradigm, Chang et al. (2011) confirmed that observing another monkey receiving a reward is vicariously reinforcing. The error rates of the actor monkey during an instrumental task were lower when the cue predicted a fluid reward to a second The white circle represents the central stimulus. The disappearance of the grey rectangle represents the go signal. The purple polygon and the green cross represent the two possible response goals. In the illustrated example trial, that could be the first trial of a behavioural session, the response choice is toward the purple polygon. B. Sequence of successive trials during a human interaction session. Numbers indicate the trial position after the trial depicted in A. Each panel represents a response choice (phase highlighted by the dashed rectangle in A). The rule of the task was the following, the correct goal was always the goal that differed from the goal of the previous trial acquired either by the monkey in the non-interactive condition and by the human partner in the interactive condition. In the human trials, monkeys had to monitor the human choices in order to act consequently in their following trials as an actor. The human partner performed one to four trials successively, removeed his hand from the front of the touchscreen and let the monkey perform his own trials. C. Behavioural performance. Bars represent the percentage of correct response of two monkeys for interactive (blue) and noninteractive trials (red). The interactive trials are sorted depending on the number of trials previously made by the human partner (from 1 to 4). Modified from Falcone et al. (2012a). monkey than when it predicted a fluid reward to no one. Moreover, the authors tested the monkey's behaviour in the same but non-social paradigm, in which the second monkey was replaced with a collecting bottle, and the error rates of the actor monkey were identical in both conditions. In summary, the studies in which the human's behaviour was clearly associated with the consumption of the food after a correct response and its absence in case of error (Falcone et al., 2012b;Bevacqua et al., 2013; 'monkey-like' human condition in Monfardini et al., 2014) showed evidence of observational learning, showing a higher learning rate in non-rewarded than in rewarded vicarious condition. Conversely, the other studies that never included food consumption by the human model, but that were limited to the presentation of the correct or erroneous response without a clear association with a received or missed reward, produced negative results (Fattori et al., 2000;Brosnan and de Waal, 2004;Meunier et al., 2007, 'stimulus-enhancing' human condition in Monfardini et al., 2014). It is however important to note that in the study of Fattori et al. (2000), the failure to learn by observation could be also attributed to the difficulty of the actions required, for example, to bring a piece of food out of reach by hand using a rack. Fig. 3. The Object-in-Place task and behavioural results in Ferrucci et al. (2019). A. Examples of stimuli displayed as objects in a problem and as feedbacks around the selected object. B. Temporal sequence of a trial. After an intertrial interval, the Central Target (CT) appears and the animal has to hold it to let the background scene and the two objects appear. The disappearance of the CT serves as a go signal for the monkey to choose one of the two objects. After a period of pre-feedback, the feedback, positive or negative, appears and the reward is delivered when appropriate. C. Example of a temporal sequence of the six runs. Five problems comprised the first run and were presented consecutively for six times. All six runs comprised one complete session. D. Learning curves for an example monkey (Monkey S). The curves show the mean percentage of correct choices for the six runs in the 60 sessions for the three different conditions (MA: Monkey Alone, HI: Human Interaction, CI: Computer Interaction. Vertical bars represent the separation between runs always performed by the monkeys (on the right) and first runs performed by different agents depending on the experimental condition (on the left). Dashed lines represent chance level (50%). As expected, the first run of performances do not differ from chance in any case. The mean percentage of correct choices in the second run is above chance in all conditions and for all three monkeys, illustrating that learning occurred after a single trial. Error bars represent standard error means (SEM). E Performance in the second run after correct responses and errors made during the first run. The scatterplot shows the percentage of correct responses in the second run for the three monkeys in the three conditions, MA, HI, and CI. Trials were divided into two groups based on the performance in the first run, correct response or error. AC: after correct; AE: after error. Stars indicate significant differences (*, p < 0.05 two-sample test for equality of proportions with continuity correction; **, p < 0.01, two-sample test for equality of proportions with continuity correction). Modified from Ferrucci et al. (2019). Ferrucci et al. (2019) used a paradigm in which the human's correct choices were associated with the reward delivery to the observer monkey, i.e. non-vicarious reward. The monkeys had to associate the human choice with the reward they received themselves at the end of the trial. In the specific case of the OIP task, the monkeys were able to monitor and extract the information about the human model's choice. They managed to repeat the choice of the same object after a human's correct response and to shift to the other object in case of human error. The body of work of Monfardini and colleagues (Meunier et al., 2007;Monfardini et al., 2014) led to think that the main factor of success of the studies using a human-monkey interaction paradigm was the similarity of the model. The authors argued that the reward consumption only ensures the knowledge transmission from human to monkey because it helps to create the 'like-me-ness' between the actor and the observer and that the reward was not mediating social learning per se. Indeed, macaque monkeys can imitate human actions (Kumashiro et al., 2003) since their first days of life (Ferrari et al., 2006) and capuchin monkeys can copy the action of a conspecific without reward (Bonnie and de Waal, 2007), suggesting that identification, bonding and interaction with the other are fundamental for knowledge transmission. Studying the ability to learn by observation from conspecifics or humans sharing different degrees of affinity with the subject could help to understand the importance of these processes in knowledge transmission and more generally in social cognition. Bonnie and de Waal (2007) illustrated that after observation of an actor monkey choosing repetitively the same empty box of three possible boxes, the observer monkeys chose more often the box previously chosen by the actor monkeys. Their results demonstrated that imitation is possible without any outcome. However, in such conditions, it does not imply that the observer monkeys extracted information from the actor monkeys and learn from its behaviour, but that they copied the observed behaviour. Indeed, the studies requiring an understanding of the other's action showed that monkeys can go beyond copying and can learn from others' mistakes by changing the incorrect choice after observing an error (Monfardini et al., 2014;Ferrucci et al., 2019). We will examine in the next section another possible explanation of the monkey's success to learn by observation. Table 1 summarizes all studies that, to our knowledge, tested explicitly or as a control, the ability of macaque monkeys to extract information from a human partner behaviour.

Ghost display condition and the role of the agent in the learning process
The differences in the observational learning highlighted by Monfardini et al. (2014) between 'monkeylike' and 'stimulus-enhancing' human models and the evidence that monkeys can extract information from human models also in other paradigms (Falcone et al., 2012a;Ferrucci et al., 2019) has raised the question of the role of the social agent in observational learning. Which aspects of an observed behaviour are essential to extract information from the model? Is the similarity between monkeys and humans sufficient to explain it? An alternative explanation has to be considered. Indeed, the clear relation between the observed choice and the outcome could be the key component for the extraction of information from the agent and introducing non-social conditions such as the 'ghost display' can help to answer the question. The 'ghost display' condition was initially studied in an experimental paradigm in which the experimenter moved a manipulandum appearing to move by itself, without the intervention of an external physical agent, without being seen by the animal (Fawcett et al., 2002). The study of such a 'non-social' agent is a very interesting tool to study better the role of the social aspects of the agent in the observational learning processes. This question was addressed mainly using great apes or children (Hopper et al., 2007(Hopper et al., , 2008Tennie et al., 2006) and reached contrasting results (for a review, see Hopper, 2010). It depended on the experimental parameters, the nature of the task and the complexity of the behaviour to be executed, from an observed choice to the observation of the utilization of a tool. To our knowledge, only two studies have examined the ability of macaque monkeys to learn in such conditions, providing contrasting results (Subiaul et al., 2004;Ferrucci et al., 2019). Subiaul et al. (2004) were the first to report cognitive imitation in macaque monkeys, as mentioned in the previous section. However, when they tested the ability of the same monkeys to learn only from visual and auditory feedback, they failed to report evidence of learning in this non-social observational condition. Ferrucci et al. (2019) tested whether macaque monkeys were able to learn not only from humans as we described before but also from a computer's choices in the specific context of the OIP task that measures one-trial learning. In this paradigm, the monkeys received the reward at the end of the computer correct choices and in this context, they found evidence of one-trial learning in non-social observational conditions. They showed that monkeys were able to learn from choices generated without social agents and that was possible even after a single observation ( Fig. 3D; modified from Ferrucci et al., 2019). The key factor to promote learning may be the reward delivery that occurred even after a computer's correct choice in the study of Ferrucci et al. (2019). In contrast, the absence of reward delivery might have made it difficult for the monkeys to establish a link between the computer's action with the visual and auditory feedback in the study of Subiaul et al. (2004). Ferrucci et al. (2019) have suggested that building an unambiguous association between a computer's choice and a monkey's reward consumption could be critical to allow learning even as fast as in one trial in the 'ghost display' condition. This learning can take place only when the animals are enabled to grasp the link between the observed action and the reward or its absence (received by the 'monkey-like' human in Monfardini et al. (2014), by the human in Falcone et al. (2012b); Bevacqua et al. (2013) and Isbaine et al. (2015) and by the monkey himself in Falcone et al. (2012aFalcone et al. ( , 2016Falcone et al. ( , 2017, Cirillo et al. (2018), Ferrucci et al. (2019). However, an additional factor to be considered comparing the results of Ferrucci et al. (2019) with the negative finding of Subiaul et al. (2004) is related to the peculiarity of the OIP task which offers a myriad of clues forming the scene. In the OIP task, the learning process is thought to be enhanced thanks to the background scene that represents a retrieval cue for each problem encountered previously. The absence of such a cue in the conditional discrimination task in fact has been shown to prevent one-trial learning (Gaffan, 1994). In summary, the important factor for the understanding of another's action for Ferrucci et al. (2019) is the construction of a clear association between the observed action and its outcome.

Perspectives in primate electrophysiology
The capability of macaque monkeys to apply abstract rules by observation was exploited in a number of electrophysiological studies, as in the role-reversal task developed by Yoshida and colleagues (Yoshida et al., 2011(Yoshida et al., , 2012. Their objective was to investigate the neural substrate for the representation of the action of another agent. In their task, two monkeys changed their role of actor and observer and to maximize the quantity of reward they can receive, it was important to monitor their partner's choice. They reported that a population of neurons of the medial frontal cortex selectively encoded the other's action (Yoshida et al., 2011) and, interestingly, also took part in the monitoring of others' erroneous actions at the single-cell level (Yoshida et al., 2012). Although the representation of others' actions and the reward have been investigated in the last two decades since the discovery of the fronto-parietal mirror system (Gallese et al., 1996(Gallese et al., , 2004Rozzi et al., 2008;Caggiano et al., 2009;Bonini et al., 2010;Yoshida et al., 2011;Azzi et al., 2012;Hosokawa and Watanabe, 2012;Chang et al., 2013;Baez-Mendoza et al., 2013;Lanzilotto et al., 2017), it is only recently that the study of anticipatory or predictive signals of others' behaviour started to represent a topic of interest (Haroush and Williams, 2015). The use of a human-monkey paradigm to study the representation of others' behaviour offers the possibility to remove the uncertainty of the Table 1 An overview of studies which have incorporated a human-macaque monkey interaction paradigm. The animals about others' intention and allows the study of the predictive behaviour when a delay is introduced before the human choice. Indeed, in the experiments that used the NMTG task, no errors were made by the human agent and this unambiguous behaviour favoured the monkey's prediction of the next human choice. By using this paradigm, several frontal areas have been studied. In a first study in the lateral prefrontal cortex, the presence of neurons with predictive or anticipatory activity for the human agent's future choice emerged (Falcone et al., 2016). Some of these neurons lacked specificity and were involved similarly in the representation of the monkey's choice and may represent a covert mental simulation, others encoded only the human future choice and could represent an agent-specific prediction of the other's agent choice. In the medial prefrontal cortex (area 9), a large group of cells showed during the delay a predictive activity for the human future choice but not the monkey's choice (Falcone et al., 2017) while other neurons encoded both. These cells however also exhibited an independent predictive activity in terms of their spatial tuning, meaning that a single cell could participate in a different computation depending on who prepared and performed the action; for example, coding with a greater activity the left target in the human trials and the right target in the monkey trials. The switch between actor and observer in the NMTG paradigm allowed also to reveal specific features of the role of the dorsal premotor cortex as the presence of different neuronal substrates that differentiate self and others' behaviours (Cirillo et al., 2018), a result which is different from the passive observation used in other studies (Cisek and Kalaska, 2004;Tkach et al., 2007). Combined with the possibility to use the monkey's ability to learn from a computer's choice, the human-monkey interaction paradigm opens a promising new line of research that aims to understand the neural bases of observational learning. It could be a powerful tool for investigating the neural bases of individual and observational learning from both humans and artificial agents. The control of the agents' performance allows to generate confidence in prediction and can be used to study the predictability of others' social and non-social behaviours.

Declarations of interest
None.

Funding
This work was supported by the ERC 2014 grant (European Research Council, 648734-HUMO).