Abstract

Interactive or collaborative pick-and-place tasks occur during all kinds of daily activities, for example, when two or more individuals pass plates, glasses, and utensils back and forth between each other when setting a dinner table or loading a dishwasher together. In the near future, participation in these collaborative pick-and-place tasks could also include robotic assistants. However, for human-machine and human-robot interactions, interactive pick-and-place tasks present a unique set of challenges. A key challenge is that high-level task-representational algorithms and preplanned action or motor programs quickly become intractable, even for simple interaction scenarios. Here we address this challenge by introducing a bioinspired behavioral dynamic model of free-flowing cooperative pick-and-place behaviors based on low-dimensional dynamical movement primitives and nonlinear action selection functions. Further, we demonstrate that this model can be successfully implemented as an artificial agent control architecture to produce effective and robust human-like behavior during human-agent interactions. Participants were unable to explicitly detect whether they were working with an artificial (model controlled) agent or another human-coactor, further illustrating the potential effectiveness of the proposed modeling approach for developing systems of robust real/embodied human-robot interaction more generally.

1. Introduction

1.1. Introduction

Moving objects from place to place is a common daily activity. Whether picking up a dish and placing it in a dishwasher or selecting a part from an assembly line for manual construction, such pick-and-place behaviors, often involve the repeated sequence of goal-directed actions. Pick-and-place behaviors also often occur within social contexts, requiring multiple individuals to coordinate pick-pass-and-place action sequences together. For example, handing books to a colleague to fill a shelf or passing plates when setting a dinner table with friends. In these collaborative pick-and-place contexts, coactors often act in a highly efficient and successful manner with minimal communication or explicit prior planning. Indeed, the coordinated patterns of multiagent pick-and-place behavior is often best understood to be an emergent consequence of the real-time perception and actualization of actor-specific action possibilities (i.e., affordances) that structure a (multi-) agent-environment task space [1].

Pick-and-place tasks have become central to the development of robust and adaptive human-machine or human-robotic interaction (HMI and HRI, respectively) in part due to their ubiquity in everyday life [25]. One challenge with regard to modelling multiagent pick-and-place behavior for HMI/HRI, however, is that using high-level task-representational algorithms or preplanned action or motor programs for controlling artificial agents quickly becomes intractable, even for simple interaction scenarios, such as when a robot and a human must coordinate to move a collection of objects from one side of a table to another [68]. A method to avoid the intractability of high-dimensional and variable planning problems in HMI/HRI is to reduce an artificial agent’s potential action space within a complex interaction context by imposing human-like constraints on the task/behavioral dynamics [9, 10] of the artificial agent [11, 12].

Research on human motor control [9, 16], e.g., [17] has revealed that human goal-directed actions are composed of two fundamental movement types: (1) discrete movements, as reaching for an object or target location, hitting, kicking, or throwing a ball, etc., and (2) rhythmic movements, such as waving a hand, hammering a nail, or simply walking. The significance of this finding is the implication that human movement activity reflects two fundamental behaviors of nonlinear dynamical systems—namely, point-attractor dynamics for discrete movements and limit-cycle dynamics for rhythmic movements [912, 16, 18]. This also implies that the behavioral dynamics of human activity can be derived and modelled from these two types of dynamical movement primitives. Consistent with this possibility, numerous human movements and actions like reaching, wiping, cranking, jumping, drumming, throwing, hitting, and bouncing have been successfully modelled using relatively simple task-specific systems of fixed-point and limit-cycle attractors acting on corresponding end-effectors (e.g., hands for reaching) or limb-joint systems [1923]. Similar dynamical movement primitives have also been employed to model the behavioral dynamics of human goal-directed navigation within an obstacle-ridden environment, including human route selection and switching behaviors [24, 25]. Motivated by the understanding that many human behaviors can be modeled using point-attractor and limit-cycle systems, several researchers have demonstrated how similar dynamical movement primitives can significantly reduce the dimensionality of behavioral control in artificial humanoid and robotic systems [4, 16, 18, 2628]. For instance, Ijspeert and colleagues [11] have shown how dynamical movement primitives can be employed to generatively train a virtual end-effector or multijoint robotic arm to perform a range of tasks, from goal-directed reaching and obstacle avoidance to racket swinging.

With regard to complex multiagent activity, behavioral dynamic models composed of dynamical motor primitives have been employed to capture and understand the stable patterns of coordinated perceptual-motor behavior across a range of discrete and rhythmic interpersonal task contexts, e.g., [15, 2931]. Of particular relevance here is the recent work by Lamb and colleagues [13] demonstrating how collaborative pick-and-place behaviors could be modelled effectively using a hierarchical behavioral dynamic model that captured both the movement trajectories of actors and inter-actor pass decisions. That is, the behavioral dynamic model was able to effectively simulate the movement behaviors of the agent’s end-effector (i.e., hand movements), as well as the dynamics of action selection (i.e., to pass or not to pass) during ongoing task behavior (for related research on modeling action selection dynamics see, e.g., [32, 33]). The aim of the current study is to extend this latter work and bridge the work on dynamical movement primitives and behavioral dynamic modeling, by investigating whether multiagent hierarchical behavioral dynamic models derived from dynamical movement primitives can employed to control artificial agents for HMI/HRI [4, 13, 30, 31, 34]. Indeed, although dynamic movement primitive models have been implemented in HRI contexts, they are not typically derived from human movement data. In contrast, although behavioral dynamic models have been derived from human movement data, they have not been well explored in the context of HMI/HRI. The current study, therefore, brings these related areas of research together into a single human-agent task context in order to demonstrate and validate methods for using human interaction-derived hierarchical task-dynamic models as an agent control architecture.

Building on data observations from a human-human interaction (HHI) experiment exploring the behavioral dynamics of a more interactive (complicated) pick-and-place task than was employed by Lamb et al.[13], in which only one agent made decisions and initiated task actions. In the current experimental task, both coactors could make decisions about and initiate task actions, allowing for the possibility of greater influence and range of task action combinations. Further, we implemented an extended version of the hierarchical behavioral dynamic model proposed by Lamb et al. [13] within the control architecture of an artificial human agent in virtual reality. The proposed pick-and-place agent (PAPA) was expected to illustrate how this hierarchical behavioral dynamic model can be used not only to capture and predict human behavior, but also to enact human behaviors in a collaborative human-agent interaction (HAI) task context as a generalized form of HRI/HMI. More specifically, we compared the behavior of PAPA and its human coactor to the behaviors of humans working together with other humans to demonstrate the application of hierarchical behavioral dynamic models (i.e., models that capture both movement trajectories and action selection decisions) for effective and robust human-agent pick-and-place task behavior which could be implemented in any appropriate agent controlled system.

1.2. Original Model
1.2.1. Directed Reaching Dynamics

With regard to pick-and-place behaviors in both individual and multiagent contexts, the hand movements of individuals engaged in a goal (target) directed reaching task can be modeled using a modified version of a behavioral dynamic model first introduced in the context of goal-directed locomotion [13, 25, 3537] (see also [9, 11]). The model characterizes the movements of an agent’s end-effector in terms of its heading direction, , such that Here, , and , correspond to the velocity and acceleration of the agent’s end-effector heading angle, respectively, and and are damping and spring/stiffness terms, such that acts as a friction force on turning rate, and the function operates to minimize the difference between the current heading angle, , and the angle, , of the corresponding subtask goal/target location (see Figure 1). The distance of the agent from the current goal location is defined by the Euclidean distance, , with the function introducing an exponentially decaying term characterized by a constant offset parameter that ensures that the rate of change in heading direction never goes to zero [35] and an exponential decay rate, which is a function of the constant parameter and the distance to the goal, . Notably, while (1) is a purely reactive model of end-effector movements, it is both capable of producing human-like approach trajectories and ensuring that the system does not get trapped in local minima which would keep it from arriving at its goal [13, 27, 36, 37].

Typically, human hand movements in directed reaching tasks exhibit a bell-shaped, nonconstant velocity profile [38, 39]. This velocity dynamic can be modeled as where and act as damping and stiffness terms on the rate of change of end-effector’s velocity, , which increases and decreases as a function of the current goal distance, [13, 40]. When the end-effector is far away from the target location and increases, as the distance to the goal location decreases, however, approaches zero and decreases accordingly. The constant parameter specifies the maximum velocity in m/s, such that the same equation can be used for a wide range of different movement distances, with differential peak velocities resulting for shorter and longer distances.

1.2.2. Action Selection Dynamics.

The movement model introduced in the previous section assumes that movements are directed at a single goal location only. As a result, when multiple goal locations are available, an additional system is required to handle switching between goal states. For example, in Lamb et al. [13], participants were allowed to choose between passing a task object to another person or taking it to an indicated target location. To capture the dynamics observed (i.e., metastability and hysteresis), these pass decisions were modeled using a nonlinear first-order ordinary differential equation (ODE), with the first derivative of the current decision, , a function of the current decision state, , and an agent normalized task parameter, , where indexes a task specific parameter equation. That is, This system exhibits a saddle-node bifurcation as is scaled up or down past a critical value (approximately 0.35) (see Figure 2). For critical values , (3) exhibits a region of bistability which corresponds to the hysteretic behavior observed in human participants when is smoothly scaled (also see [32] for an example of how this same system can capture the pass-or-not-pass dynamics in rugby). For values and the system has a single stable fixed point at and , respectively. In Lamb et al. [13] it was shown that the decision to pass the object to a coactor or complete the task alone was driven by the agent’s distance to the target, , relative to their reach capability, , such thatNote that the constant scaling factors, and , operate to normalize the task specific with respect to the current task space.

1.3. Current Study

The current study had two related aims. The first aim was to extend the Lamb et al. [13] pick-and-place hierarchical behavioral dynamic model to a more interactive, free-flowing, two-person cooperative pick-and-place task, in which both actors were free to select, move, pass, and organize a constant stream of objects. The second aim was to validate whether the resultant model could be implemented successfully into an artificial agent (virtual avatar) capable of producing effective and robust human-like behavior during human-machine interaction.

The current study uses a general human-agent interaction (HAI) paradigm which subsumes the structure of both HMI and HRI. Two virtual reality experiments were conducted to achieve the study aims. In the first experiment, pairs of naive human-actors completed a virtual pick-and–place task cooperatively. They were required to pick up and move a constant stream of colored discs, presented one at a time, from one end of a virtual tabletop to a corresponding colored target positioned at the other end of the virtual tabletop. Participants stood on opposite sides of the virtual table and were free to choose when and who picked up and moved the discs and whether to move the discs to the target location alone or by passing them between each other [40]. Of particular relevance for extending the Lamb et al. [13] model was determining and mathematically modeling the dynamics of the object pick-up decisions, as in the previous task only one participant picked up the object. Based on the pass decision dynamics defined in (4), we expected that a participant’s object pick-up decision would be functionally related to arm length or, more specifically, comfortable reaching distance. However, we also expected that participant pick-up decisions would be a function of the relative distance of their own and their coactor’s current hand location with respect to the location of the object to be picked up. In other words, pick-up decisions would be a relative function of both reachability and proximity.

The same virtual pick-and-place task was employed in Experiment 2, except that naïve individual human actors were recruited to complete the task with a virtual avatar whose movements and action decisions (i.e., object pick-up and pass decisions) were controlled by the extended Lamb et al. (2017) hierarchical behavioral dynamic model identified from the results of the human-human testing conducted in the first experiment. The expectation was that the behavioral dynamics exhibited during human-artificial system testing would be qualitatively and quantitatively similar to human-human performance because PAPA was derived from human-human interaction observations. We also manipulated whether actors knew if the movements and decisions of their virtual coactor were computer-controlled. Of particular interest was whether participants who were led to believe that the coacting avatar was human-controlled would be able to discern this deception. If the interpersonal pick-and-place model developed here was able to effectively capture the dynamics of human performance then participants should be unaware of the deception (i.e., participants should believe they were working with another human actor when told so, even though they were not).

2. Human-Human Interaction Experiment

2.1. Method
2.1.1. Participants

Twenty University of Cincinnati undergraduate students (14 females and 6 males; aged 18 to 28 years, all right-handed) were recruited to participate in Experiment 1. Participants participated as pairs, though they did not necessarily know each other prior to the experiment. All participants were recruited via the Psychology Department’s online recruitment system and received partial course credit for participation. Participants provided written consent prior to completing the study, with the procedures and methodology employed reviewed and approved by the University of Cincinnati Institutional Review Board.

2.1.2. Materials and Apparatus

An illustration of the experimental setup and task is provided in Figure 3. Participants stood on either side of a 1.65m × 0.89m × 0.995m table in a laboratory room and completed the pick-and-place task in a virtual environment. Participants stood across from one another in both the laboratory room and the virtual environment. The virtual environment consisted of a room similar to the laboratory room, with a virtual table that was the same size as the real laboratory table. The physical table provided a solid surface on which participants could move a hand-held wireless Polhemus Latus motion-sensor (Polhemus Ltd, Vermont, USA) that tracked their right-hand movements at 96 Hz.

The virtual environment was presented to participants using Oculus Rift (DK2) virtual reality headsets (Oculus VR, Irvine, California) and was designed using the Unity 3D game engine (version 5.2.0; Unity Technologies, San Francisco, California) and Sketchup 2015 (Tremble Navigation Technologies, Sunnyvale, California). The Oculus Rift presented the 3D environment using a pair of 1920x1080 screens arranged to produce stereoscopic 3D images at approximately 75Hz with a 100° FOV. Positional head tracking was provided by the Rift’s Crystal Cove tracking system. The maximum display latency between the participants’ real-world movements and their movements in the virtual environment was 33ms, with experimental task states and movements recorded at 70 Hz.

As illustrated in Figure 3, the participant standing position on side “A” of the table was slightly closer to the object appearance side of the table compared to the participant standing position on side “B” of the table. This ensured that the (tracked) right-hand of each participant was equidistant from both the object appearance range and the target (object-drop-off) locations.

Within the virtual environment, the participants were represented as identical avatars modeled after a crash test dummy with a height of 1.8m. The avatar’s right hand was represented by a semitransparent blue sphere in order to simplify interaction with the task environment. The movements of this virtual hand were defined by the position of the participant’s hand-held Polhemus motion tracking sensor. Avatar head movements were mapped to actual participant head movements tracked using the Oculus Rift’s Constellation tracking system. These hand and head motions were integrated with an inverse kinematics controller (model and controller supplied by Root Motion, Tartu, Estonia) in order to generate related right arm (e.g., elbow angle and forearm orientation) and upper-body (torso) movements of the participants’ virtual avatars. The resulting arm and upper-body movements were not identical to the real-world arm and body movements of the participants but were deemed to be close enough to render any differences between the real and virtual body postures of the participants unnoticeable or not functionally distinct.

2.1.3. Pick-and-Place Task

As illustrated in Figure 3, participants were immersed within a virtual environment including a virtual table mapped to a lab room table. Disc objects for pick up (henceforth discs) were presented to participants on one end of the table and were color coded (magenta, yellow, green, blue, or red) to indicate a specific target location. Discs were presented randomly within a region near the side of the table occupying the middle third of the table. Participants were instructed to pick up the discs when they appeared and move them to the target location of the corresponding color as quickly as they felt comfortable. Target locations and colors were fixed across all participants and trials (see Figure 3). Participants were informed that either one of them could pick up a disc when it appeared, but only one individual could hold the disc at a time. Importantly, participants were also informed that if the target was either too far away or uncomfortable to reach, they could pass it to their coactor.

A pick-up occurred when the participant’s sphere came in contact with the disc. When picked up, the disc moved with the participant’s sphere until it reached the target or the individual released (dropped) the disc. A participant could release (drop) a disc anywhere on the tabletop by lifting their hand/sphere up and away from the tabletop. Once a disc was dropped either individual could then pick up the disk. A pass involved one participant picking up the disc at the appearance location and then moving and releasing the disc partway across the tabletop for the second participant to then pick up and move the disc the rest of the way to the target. Note that although it was not explicitly prohibited, no back-and-forth passing was observed in the study.

2.1.4. Procedure

Upon arrival, participants were informed that the experiment was investigating the dynamics of a two-person pick-and-place task. Participants were then randomly assigned to side A or B of the table and were positioned in their assigned table locations. Participants were then instructed to secure the Oculus Rift HMD on their head and their first-person view was calibrated to be properly aligned with their avatar’s head height. Following task instructions (see Section 2.1.3), participants completed 2 practice blocks to acclimate to the task environment and mechanics.

The first practice block consisted of 12 trials, in which a green disc always appeared in the center of the appearance region and had to be moved to the middle (green) target. Each participant took 6 turns picking up the disc and was instructed to pick up the object and take it to the target 3 times on their own and to pick up the object and the pass it to their coactor 3 times. The second practice block involved 20 trials, 4 trials for each target location (i.e., 4 discs of each color). In this practice block, discs appeared in a random y-axis pick-up location within the appearance range on each trial and participants were instructed to complete the task as quickly as they felt comfortable and to make their own decisions about if and when to pass.

After participants completed both practice blocks and indicated that they understood the objective of the task, they completed two experimental trial blocks. Each experimental trial block included 150 trials, 30 trials for each target color presented in a random order. In between the first and second block of experimental trials, participants switched sides (i.e., the participant on side A moved to side B and the participant on side B moved to side A) and were given a 5-minute break. Each experimental block lasted between 10 and 15 minutes.

2.2. Results and Discussion
2.2.1. Decisions

There were two decision events in the pick-and-place task: (1) a decision to pick up or not pick up the object and (2) a decision to pass or take the object to the target after the object was picked up. In order to understand the basis for the pick decision we applied the C4.5 decision tree algorithm [41] with 10-fold cross-validation to participant in pick decisions (N = 2998) in order to create a decision tree with a minimum node size of 50 instances. Attributes that were considered for each participant included: hand’s current distance to the target, disc location, target location, participant waist height, and hand’s resting location. An attribute was not considered relevant to modeling the decision behavior if it was not included in the decision tree produced by the C4.5 method or if its exclusion resulted in a change in predictive success of < 3%. Using this exclusion criterion, the resulting decision tree was able to correctly predict 86% of the pick decisions using only the current distance of each actor’s hand to the pick-up location and each actor’s height and right arm length.

The C4.5 decision tree algorithm was also applied using a 10-fold cross-validation to the data set of passing decisions () in order to create a decision tree with a minimum node size of 50 instances. The same set of attributes considered for the pick-up decision were considered for the pass decision, with the addition of the previous pass decision. Likewise, the same exclusion criterion was used to determine which attributes were relevant to the pass decision. Using this method, 79% of the pass decisions were predicted by a decision tree constructed from only the distance of the resting location of one of actor’s hand to the target location. Resting hand location for each side was defined as a position 0.15m from the edge of the table directly in front of the participant’s right shoulder. This result was in line with results from previous research [13].

On pass trials, participants were not instructed to pass at a certain location. In order to identify pass locations, cluster analysis was conducted using the K-means cluster analysis algorithm, which finds cluster centers that minimize the sum of squared error (SSE) for a given number of clusters, k. We analyzed the release/pass locations to determine whether these locations typically clustered around 1, 2, or 3 cluster centroids (see Figure 4). The optimal number of clusters was identified using the gap statistic, defined as the value of k, such that the average difference between the SSE for a reference distribution and the actual data was greatest compared to the other values of k [14]. Reference distributions were generated for each dataset (i.e., pass locations from each table side for each pair) drawing from a uniform distribution over the principal components of pass locations in the dataset. For each pair, separate evaluations were run for each side of the table. For side A, when a participant on side A passed at least once during the experiment (N = 8 pairs), the optimal number of clusters was 1 for all passes on this side of the table. Likewise, when a participant on side B passed at least once during the experiment (N = 9 pairs), the optimal number of clusters was 1 for most pairs (N = 7). When a participant on side A passed during the experiment, the passes clustered around an average (x, y) table location of (0.24m, 0.62m). When a participant on B side of the table passed, the passes clustered around an average (x, y) table location of (0.33m, 0.18m). Both of these locations were near the resting location of the receiving coactor’s hand.

2.2.2. Movement

An example set of participant pair trajectories are illustrated in Figure 5 as a heat map. This heat map plot was created by dividing the task space into a 125x108 grid and for each trial, the number of times a participant’s location was recorded in a given grid cell was tallied to create a histogram of trajectory locations in table coordinates. A color value was assigned to each cell from a scale of 64 colors. All participants exhibited a qualitatively similar sideways “spaghetti monster” heat map, with concentrations of trajectories (brighter areas), corresponding to discs (far left side of heat map plot), pass/rest locations (top and bottom left of center on the heat map plot), and target locations (5 distinct points across the right of the heat map plot).

Participant subtask movements exhibited a bell shaped velocity profile with the peak velocity occurring around half way through a given trajectory (Figure 5) (for each side of the table, subtask trajectories examined include rest-to-pick-up, pick-up-to-target, pick-up-to-pass, rest-to-receive, receive-to-target, pass-to-rest, and target-to-pick-up). Across all subtask trajectories, the average peak velocity was 1.231m/s (Mdn = 1.252m/s, Q1 = 0.924m/s, Q3 = 1.373m/s) and the peak velocity occurred on average around 57% (SD = 15%) of any given subtask trajectory. For the 14 subtask trajectories examined, average peak velocity for each subtask trajectory was significantly correlated, r(14) = 0.89, p<0.001, with the average straight-line distance of each subtask trajectory. Shorter trajectories had lower average peak velocities than longer trajectories.

3. Model Extension and Artificial Agent Design

3.1. Pick-Up Decision Extension

As discussed in Section 1.3, we previously developed a dynamic model that characterizes both human movement trajectories and pass decisions in a simple cooperative pick-and-place task [10, 13]. In order to extend this model to the current task context, we also needed to define an action selection function with regard to object pick-up decisions. Based on the results of Experiment 1 (see Section 2.2.1), which found that individuals tended to pick up the object when their hand was closest to its appearance location at the beginning of a trial, we chose to define this function using a similar system to that employed to model pass decisions in (3). More specifically, pick-up decisions were modeled using the systemsuch that a stable fixed point at > 0 define pick up and a stable fixed point at < 0 defines do not pick up. Here, is defined assuch that the value of is determined by the difference between the current distance, , for each agent, , and their coactor, , to the object to be picked up. These distances were normalized by each individual’s respective reach capability, and scaled by each actor’s height, , and a constant task space scaling parameters . According to (6), each individual’s decision to pick up or not pick up was driven by for each agent, such that when implemented with (5), previous decisions regarding pick-ups affected the current pick-up decision (see Figure 2). Further, while each agent was modeled as making this decision independently, (6) effectively couples each pick-up decision to the other agent’s current state by taking into consideration the coactor’s current normalized distance to the goal. Thus, if both agents were equally close to the pick-up object in normalized reach terms, the model predicts that both agents may decide to move to pick up the object. However, variations in previous pick decisions, as well as action capabilities, e.g., movement speed and trajectory, ultimately result in one agent backing off while the other picks up. Intuitively, this is analogous to a situation where two people reach for the same object in a noncompetitive context and one of them pulls their hand back reactively.

Equation (6) was developed using insights from the results reported in Section 2.2 and validated using data from that study. For validation (6) was parameterized using observed trial to trial initial locations along with participant height and arm lengths. The scaling parameter was set to a constant value of 2.8. As noted above, decision predictions were determined based on the sign of the approximated solution to (5) whereby solutions defined pick up and solutions defines “do not” pick up. The equation was validated on each participant pair in the human-human data set and correctly predicted an average of 78% (SD = 12.2%, min = 59%, max = 96.7%) of pick-up decisions.

3.2. Model-Based Artificial Agent

The proposed extended model could be implemented as an interactive artificial agent system by being embedded in an appropriate control structure (see Figure 6) and embodied in the experimental task context by a virtual avatar identical to the one used for participants in Experiment 1 [42]. This artificial agent system, PAPA, controlled the movements of the avatar’s right hand on the virtual table with all other movements, e.g., arm and torso movements, driven by the inverse kinematics model used in Experiment 1.

The control structure PAPA was embedded and consisted of two components, one for driving action selection in terms of a current goal location and one for controlling movement dynamics (see Figure 6). The action selection component selected goals based on task phase, defined in terms of whether or not someone is holding the task object. Before the object was picked up, the agent selected its goal as either the pick-up object or a rest position directly in front of a virtual avatar. Pick-up decisions were driven by (5) and (6). The solution to (5) was approximated during each update loop based on an Euler integration using the currently realized state solution as the initial condition and solved for 100 iterations with a time-step of .01. The sign of the solution to this integration determined if the agent’s goal would be defined by the position of the task object or a rest position in front of the avatar’s body. If the avatar picked up the object, the action selection component integrated (3) and (4), using the same integration method as the pick-up decision. Note that, in the virtual avatar instantiation of PAPA, the reach capabilities of the agent were constrained according to typical human reach capabilities based on observations in the human-human task. Since most human participants released the object for a pass in a single location, the goal location for passes was defined within a 15cm x 15cm region near the coactor’s resting hand location. A specific pass location for any given pass was randomly selected from a logarithmically distributed set of points within this pass region, conforming with previous observations of pass location distributions [13].

Movements were driven by (1) and (2), where (1) defined the heading direction of the artificial agents end-effector and (2) defined the rate of positional change (i.e., velocity) of the artificial agents end-effector movements. The implementation of PAPA’s movement component leveraged Unity’s update logic, which runs program logic once per rendered frame. PAPA’s current heading and velocity were approximated using an Euler integration run at that update rate (approximately 80hz). Because the integration occurred in real-time with the Unity frame rate, each frame rendered a change in PAPA’s movements equal to a single step in the Euler integration with a step size equivalent to the time it took for the program to run the previous Unity frame. The velocity drove the magnitude of PAPA’s change in position, such that the position change distances were nonconstant and time normalized. Thus, at each Unity frame, the agent’s hand moved in the direction calculated heading with a magnitude modulated by the approximate solution to the velocity equation. For each Unity frame the agent’s actual position and velocity were then used to calculate the current state of the system for (1) and (2). As a result, the integration of the movement components of PAPA was directly embodied in real-time by the avatar’s right hand movements. That is, PAPA engaged in the task with no explicit trajectory planning or prediction systems.

4. Human-Artificial Agent Experiment

4.1. Method
4.1.1. Participants

20 University of Cincinnati students (aged 18 to 28 years) were recruited to participate in the experiment. 11 females and 9 males participated in the study, all right-handed. All participants were recruited via the Psychology Department’s online recruitment system and received partial course credit for participation. Participants provided written consent prior to completing the study, with the procedures and methodology employed reviewed and approved by the University of Cincinnati Institutional Review Board. A male researcher acted as a confederate throughout the experimental data collection. The confederate was the same for every participant.

4.1.2. Materials and Apparatus

The experimental task setup for the participant was identical to Experiment 1 (see Figure 3), with the exception that only one participant stood at the table to engage in the task. PAPA was used to control the hand of the virtual avatar on the opposite side of the table from the participant. As in the previous experiment an inverse kinematics controller generated the right arm movements for both the participant and PAPA based on their right hand location.

4.1.3. Procedure

The pick-and-place task and task mechanics were the same as in the previous experiment (see Section 2.1.3). In both conditions, after receiving instructions on how to complete the task and calibration in the VR environment, participants were instructed that their partner would work with them to complete the pick-and-place task. Participants always started on the A side of the table and switched sides after the first block of experimental trials. As in Experiment 1, participants completed 4 blocks of trials. After the first experimental block, participants moved the center of their side of the table in the lab and the VR environment was rotated so that the participant and their coactor switched sides of the table. After recalibration of the participant’s view in VR, the participant completed the final block of trials.

In order to understand participants’ behavioral reaction to interacting with a virtual partner, we introduced two information conditions: informed and deception. In the informed condition, participants were told that they would be working with a computer partner in VR to complete a pick-and-place task. In the deception condition, participants were told that they would be working with a human partner in VR to complete a pick-and-place task. Before coming into the lab room participants sat in a waiting area with a confederate posing as participant. Both the participant and confederate were brought into the lab and told that the task involved them being in separate rooms. The participant was asked to select a paper from a box for their room assignment. Participants were always assigned to the experimental room. While the participant waited in their assigned room, the experimenter claimed to take the confederate to a different lab room with a different experimenter. In reality, the confederate was led to a separate lab area and their part in the experiment ended.

Participants stood across from the table from their computer partner in the virtual environment. The participant always started on the A side of the table and switched sides after the first block of experimental trials. For the initial practice block, the participant was instructed to complete the task as quickly as they felt comfortable. During this practice block the computer partner was active and assisted the participant by picking up some objects and taking them to the target. The computer partner’s/artificial agent’s behaviors were driven by PAPA for this and all subsequent blocks. During the second practice block, the participant was instructed to complete the task as quickly as they felt comfortable and to attempt a pass to the computer partner at least once. For all participants, PAPA ended up passing at least once in this practice block. Following the second practice block, if there were no questions, participants began the first experimental block.

In the virtual environment the participants stood across from the table from their computer partner in the virtual environment. They were informed that their partner controlled the hand of the other avatar in the VR environment. For the practice trials, participants were informed that their partner had been instructed to start first and that when they were ready they could join in the task to practice. For all participants, PAPA ended up passing at least once in the second practice block. After the experiment, participants were asked a short series of questions regarding their experience and were then informed that the person they entered the lab with was a confederate and that they had actually completed the task with a computer algorithm.

4.2. Results and Discussion
4.2.1. Decisions

In order to compare pick-up decisions across HHI and human-agent interaction (HAI) conditions, we calculated absolute value of the difference between the average percentages of initiated pick-ups for each coactor in a participant pair (see Figure 7). This provides a measure of the division of labor between coactors, where large values indicated that one person tended to pick up more often and small values indicated coactors initialized pick-ups a similar percentage of the time. In the HHI condition the average difference in initialized pick-ups was 10.9% (SD = 12.7%, N = 10). For the HAI deception and informed conditions the average differences were 6.7% (SD= 3.7%, n=10) and 9.1% (SD = 11.7%, n=10), respectively. A one-way ANOVA demonstrated no significant difference in the division of labor between HAI and HHI conditions (F(2,27) = 0.43, p =.65), suggesting that participants did not change their pick-up strategies when working with PAPA. A one-way Bayesian ANOVA further suggests that there is moderate evidence that participants did not change their pick-up strategies in the HAI condition as seen in Table 1 [43, 44]. In the HAI conditions, three participants in the deception condition and five participants in the informed condition initiated pick-ups a larger proportion of trails than PAPA.

Regarding pass decisions, the percentage of passes made by each participant relative to the number of times they picked up the object was calculated (see Figure 7). In the HHI condition coactors passed an average of 24.4% (SD = 18.0%, N = 20) of the trials. For the deception HAI condition, the agent passed an average of 28.0% (SD = 3.2%, N = 10) and the participant passed an average of 26.6% (SD = 9.2%, n = 10). For the informed HAI condition, the agent passed an average of 35.7% (SD = 13.3%, n=10) and the participant passed an average of 33.0% (SD=9.0%, n=10). There was a statistically significant difference between conditions as revealed by a one-way ANOVA (F(2,57) = 3.157, p = 0.05). A Tukey post hoc test revealed that there was a significant difference between the informed HAI condition (M = 34.3%, SD = 11.1%, n=20) and the HHI condition (p < 0.05), but that there was no statistically significant difference between the HHI and HAI deception condition (M = 27.3%, SD=6.7%, N = 20, p = 0.204) or between the HAI informed and deception conditions (p = 0.755). Overall participants passed more in the HAI deception condition than the HHI condition, suggesting possibly that knowing the coactor was a computer changed their perception of the PAPA’s action capabilities or at least their willingness to make their coactor work more.

4.2.2. Movements

Qualitative comparisons of coactor pair trajectories can be made using heat maps illustrating trajectories of all coactors in each experimental condition, as seen in Figure 8. Heat maps were produced by creating 150x133 grid of the table space and tallying the number of times each PAPA or the coactor’s location was recorded within a given grid region. Colors were assigned to each grid from a color map with 64 colors.

Along with a visual inspection of trajectory heat maps, the relative difference between the trajectories of pairs within each condition was quantified using the earth mover’s distance (EMD) metric. EMD is widely used for pattern recognition and content-based image retrieval where it is used to provide a measure of pattern or image similarity based on intensity histograms [45]. In the current context, EMD provides an intuitive quantification of the similarity/difference between pairs of trajectory histograms. A common approach to characterizing EMD is by a metaphor of moving dirt (hence the name), in which EMD is described as treating the bins in compared histograms as differently sized piles of dirt at the bin locations. The value output by the EMD metric represents the minimum amount of effort required to transform one histogram into the other, if dirt can only be moved between adjacent piles. In the current context, a lower EMD value indicates greater overall similarity between compared trajectory histograms and higher values indicate greater overall difference. When there is greater similarity, it suggests that there is less variability among the trajectory patterns being compared. Likewise when there is less similarity, it suggests that there is greater variability among the trajectory patterns being compared. While EMD is computationally expensive to apply to high resolution data sets, it has been shown to be robust when the resolution of a dataset is significantly compressed [45]. As such, trajectories in the current study were characterized by 2D histograms measuring 50x62 bins. These reduced resolution heat maps are referred to as signatures. Within each condition, a signature was created for the first and second experimental block. For each block, an EMD value was calculated between each participant pair and every other participant pair. An average EMD was calculated for each participant pair using this process representing the average trajectory pattern similarity of each participant pair to all other participant pair trajectories. We then used the calculated EMD values to compare movement trajectory similarity between experimental conditions.

There was a statistically significant difference between conditions as determined by one-way ANOVA (F(2,27) = 11.512, p < .001). A Tukey post hoc test revealed that there was a significant difference between the EMD for the HHI condition (M = 316.11, SD = 27.03, SE = 15.41) and the deception condition (M = 245.87, SD = 13.93, SE = 4.40, p < 0.001) and the informed condition (Avg = 271.00, SD = 27.03, std. Error = 8.55, p = 0.014). There was no statistically significant difference between HAI conditions (p = 0.226). This indicates that there were differences in the overall trajectory patterns between the HHI and HAI conditions but not a difference in the trajectory patterns produced during HAI trials. Overall HHI trajectory patterns were less similar to one another than trajectory patterns produced in the HAI conditions.

In order to determine if the difference between HHI and HAI conditions was driven by the artificial agent behaviors alone, we calculated trajectory histogram signatures for each individual human participant in all three conditions, as well as for the PAPA produced trajectories in the HAI conditions. This allowed us to examine 3 agent-type groups, i.e., humans in the HHI condition (N = 20), humans in the HAI condition (N = 20), and PAPA instances in the HAI condition (N = 20). An instance of PAPA was defined as a PAPA which was paired with a specific human participant in an HAI condition. As in the previous analysis comparing experimental conditions, histogram signatures were made for each block in the experiment and individuals were compared in a pairwise fashion to all other individuals in their group for that block. After calculating the EMD for all individuals in this manner an average EMD value for each participant and agent instance was calculated. For human produced trajectories, the resulting EMD values represented the average trajectory pattern similarity of each participant to all other human participant trajectories in the human-agent-type groups. Likewise, for PAPA instances, the EMD value represented the average trajectory pattern similarity for each PAPA instance to all other PAPA instances.

There was a statistically significant difference between agent-type groups as determined by one-way ANOVA (F(2,57) = 15.908, p < .001). A Tukey post hoc test revealed that there was a significant difference between the EMD for the humans in the HHI condition (M = 177.06, SD = 29.40, SE = 6.57, N = 20) and PAPA instances in the HAI conditions (M = 140.34, SD = 17.26, SE = 3.86, p < 0.001, n=20). There was also a significant difference between the humans in the HHI condition (above) and the humans in the HAI conditions (M = 149.57, SD = 14.60, std. Error = 3.26, p = 0.014, N = 20). There was no statistically significant difference between the humans and PAPA instances in the HAI conditions (p = 0.535). Thus, trajectory variability remained different between HHI and HAI conditions and human participants were more similar to their coactors in a given condition.

4.2.3. Perception of Agent

In the deception condition, participants were asked a series of exit questions regarding their perception of their partner. Because they were not deceived with regard to the nature of their partner, the questions were not asked in the HHI and HAI informed conditions. For the questions and a breakdown of participant response see Table 2. Overall participants in the HAI deception condition failed to recognize the deception.

5. General Discussion

The aim of the current project was to build on behavioral dynamic approaches to HAI, developing a human inspired collaborative agent, with a focus on introducing action selection dynamics into the artificial agent design. In the current pick-and-place task, the PAPA was able to successfully collaborate with a human coactor. All instances of PAPA completed the task without additional participant instructions and, in conditions where a confederate was used, without explicitly revealing that PAPA was a computer/model driven. The proposed approach converts an observationally grounded collaborative behavioral dynamic model into an embodied dynamic action planning system which can be implemented in both virtual and robotic domains. Unique to PAPA is a demonstration of the embodiment of both dynamic action producing and dynamic action switching components operating in a real-time collaborative planning agent. In the remainder of the paper, we will explore several insights, questions, and challenges raised by our results. PAPA is a relatively simple starting point, demonstrating a novel approach to using hierarchical behavioral dynamical movement primitives of human interactions for designing future collaborative HMI systems.

5.1. Using Human Models for Artificial Agents
5.1.1. Human Movement Constraints

As detailed in Section 1.3, movement dynamics can be given mathematical formulations which characterize not only the overall functional features of an agent’s activities, but also how those behaviors unfold over time [13, 18, 31, 46, 47]. Identifying human relevant dynamical constraints on the behavior of interactive artificial agents provides a method for developing controllers that are robust to changes in task contexts and unexpected task perturbations and do not depend on preplanned trajectories. In the current study, PAPA assumed that human trajectories were not generated as a result of variations on a movement template, but as the result of constraints on a self-organizing dynamical system embedded and parameterized with regard to an environment task context.

By focusing on constraining dynamic trajectory generation without reference to predefined movement trajectories, PAPA can perform adaptive and context sensitive movements that feel natural to human collaborators. Indeed, in the current task we explored the development and quantification of an artificial agent capable of producing emergent point-to-point trajectories in an ecologically valid collaborative task space. The task space construction meant that few (if any) identical trajectory end points existed in the data set. The implemented behavioral dynamic model was able to produce qualitatively similar trajectories across conditions. The results regarding total trajectory similarities exhibited by the model were mixed (as quantified by the EMD measure), with greater dissimilarity among trajectory patterns indicative of greater variability in trajectory patterns between participants. Thus, while the data indicates that the trajectory patterns produced by PAPA in collaboration with a human were significantly less variable then the trajectory patterns produced by humans collaborating with one another, the trajectory patterns produced by PAPA and humans in the HAI condition exhibited similar variability. When working with PAPA, the human collaborator adapted their behavior to the agent’s behavior which is in line with previous research on adaptive HMI systems [4, 34, 48]. Given participant responses to the exit questions, along with the lack of significant differences in trajectory pattern variability within HAI conditions, explicit knowledge of the coactor’s agency was not a significant factor driving affecting trajectory variability.

One aim of the current study was to implement a set of human interaction-derived behavioral dynamic models as an agent control architecture. As anticipated the resulting agent was adaptive and easy to interact with and did not seem to raise suspicion in participants. However, the results indicate that future research should determine if there are specific modifications that can make the agent’s behaviors more human-like, particularly with regard to trajectory variability. Previous research on human movement and coordination suggests that variability may be introduced with the addition of noise. Noise may be simulated based on observations of human movement behaviors and could be added into the PAPA algorithms in order to produce greater trajectory variability. This approach has been used successfully in simulations of similar systems and in noninteractive robots using similar control approaches [17, 27, 30, 40]. If the goal is human-likeness, it is likely that the type and magnitude of noise would need to be grounded in observations of human movement patterns or coupling to the human coactor [4952]. Alternatively, since PAPA was a successful interaction partner and did not seem to get in the way of its coactors, the addition of noise or increased variability may not provide any specific tangible benefit in many application contexts. While human-likeness can be an important goal unto itself, it may also be better to set aside this goal when it does not enhance task success.

5.1.2. Action Selection Dynamics

While previous research on behavioral dynamics of multiagent coordination and dynamical movement primitive models for HMI have primarily focused on movement generation, the current project aimed to extend these modeling approaches by demonstrating the application of hierarchically structured action selection dynamics [32, 33]. Indeed, the addition of action selection within the proposed hierarchical behavioral dynamic framework adds an important tool for developing adaptive and flexible HMI agents capable of dynamically changing behaviors and interaction strategies without explicit task instruction. The action selection model proposed in this paper predicts most of the observed subtask decisions, while grounding both the decisions and variability in measurable task features. Thus, variability was introduced by the differences in coactor capability, variation in the current task state configuration, and previous task/decision states. Moreover, building on previous research and decisions in the pick-and-place context exhibit features indicative of nonlinearity, e.g., hysteresis [13].

Regarding pass decisions observed in HHI and HAI, when participants knew they were working with a computer they passed the object to their partner significantly more often than when they knew they were working with a human. However, participants working with PAPA, but who thought they were working with a human, did not exhibit significantly different passing decisions from the HHI condition nor the nondeception HAI condition. Nevertheless, while observations in the current HHI condition suggest that participants chose to pass or not pass based almost entirely on their own capabilities, when their partner was perceived as having less constrained reach capabilities, their decision is known to be affected [53, 54]. Accordingly, it is notable that the most interactive subtask (i.e., passing) participants passed more often to PAPA when they knew PAPA was a computer program. This suggests that human coactors were more inclined to work less when they knew their coactor was not a person. Previous research and the current HHI condition found no influence of the other participant’s reach capabilities on the decision to pass [13]. However, in the current study it appears that pass decisions were affected by the perception of PAPA’s reach capabilities. Thus, while PAPA’s reach capabilities were parameterized such that it would pass and pick up similar to a human coactor, the knowledge that it was a computer controlled artificial agent did appear to result in participants viewing its reach capabilities as farther, better, or requiring less effort relative to a human coactor.

5.2. Applications in VR/AR and Robotics

While the collaborative agent proposed in this paper was implemented in VR, the ultimate aim is the development of interactive agents embodied in a physical robotic system [7, 8, 55]. Indeed, PAPA has been implemented as a proof of concept in a Kinova Mico2 robotic arm [42]. Robot collaboration has tremendous potential to produce transformative technologies in a wide range of areas. However, while computational systems are capable of complex planning and control, current hardware systems lack the capacities required to interact in a meaningful way with human collaborators, e.g., speed, safety, and portability. In the current experiment, modern commercial VR was used to provide an intermediate research platform for planning algorithms that might be the basis for future HRI systems. Importantly, in the current task, PAPA was not only able to complete the task with its human collaborator, but able to do so in a way that did not indicate to the participant that it was not in fact a human partner. Both participant behavior and responses to the exit questions support this claim. Given that proliferation of Wizard of Oz studies for testing and comparing algorithms in HRI, VR provides a promising mechanism for obscuring the actual control system for potential collaborators [56, 57]. Moreover, by placing PAPA “in the wild” using VR, we can further determine the role of interaction in shaping behaviors. While the current study provided only a single iteration of parameterization and testing, future work will focus on an iterative design, parametrize, and test process in order to produce a viable interactive agent.

6. Conclusions

The future of successful collaborative virtual and machine agents will depend on a multifaceted design approach that takes the complex dynamics of human action and action selection seriously. In both cognitive science and robotics, researchers have successfully modeled and driven a wide range of individual movement behaviors using complex dynamical systems methods. These methods have been extended to both human-human and human-machine interaction contexts, though in the latter context this has been done relatively independently of the former. In the current research, we have brought together these two perspectives, refining the dynamical movement primitives used to drive motion based on behavioral dynamic models. We have also extended these approaches, introducing methods for dynamically shifting goals by the introduction of task-relevant action selection dynamics. Moreover, we have demonstrated the application of a hierarchical behavioral dynamic model of multiagent, HAI coordination in a nonrhythmic interaction task with multiple and constantly changing goal states. Finally, the artificially controlled agent was able to successfully collaborate with a human coactor in a way that did not cause participants to suspect it was a nonhuman agent.

Data Availability

Deidentified data is available by contacting Dr. Lamb via email at [email protected].

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The research was supported by the National Institutes of Health [NIH R01GM105045], the National Science Foundation [NSF SBE1513801], and an Australian Research Council Future Fellowship, Awarded to Professor Richardson [FT180100447]. We would like to thank Dr. Mario di Bernardo, Dr. Elliot Saltzman, Dr. Charles Coey, Dr. Auriel Washburn, and Dr. R. C. Schmidt for their reviews of the proposed model and related insights that resulted from other collaborative work. We would also like to thank reviewers and others for feedback on an early version of the HHI research results presented in poster form at Annual Meeting of the Cognitive Science Society 2017.