An Affective Serious Game for Collaboration Between Humans and Robots

A


Introduction
There is a growing interest in the field of human-robot interaction (HRI) for the investigation of robots and their emotional abilities through interaction with peers or colleagues on shared tasks [1,2]. Such proximate interaction between humans and robots includes social cues which are perceived to elicit physiological affect (e.g., gaze, expressions, gestures, speed, distance). During such social interactions, humans tend to attribute emotional states to robots [3]. The aim of these previous investigations in social and collaborative HRI is to investigate how humans and robots interact together in a shared physical space aimed at accomplishing a goal [4], as numerous studies indicate the importance of such social interactions [5]. Takayama et. al. [6] have found a significant effect of pet possession on the proxemic behaviors of humans interacting with robot partners, where individuals who previously owned a pet were willing to get closer to the robot partner. These findings motivate a need for the investigation of elicited physiological affect regarding robot partners interacting collaboratively on a shared task. Moreover, it motivates a need for a deeper understanding of how elicited physiological affect influences the decision performance in human partners, to inform the design of robot collaborators and serious games.
The Theory of Mind reasons that humans perceive and distinguish various social cues to explain events in terms of intents and goals of agents (i.e., their actions) which might affect the elicited emotions [3,7]. Following these propositions, this study set out to understand the mechanism of affective exchange that occurs between human and robot agents, based on the observable social cues [3]. Such embodiment provided by the social cues of collaborating robots has a direct link to the role of body information in an intelligent behavior [8]. The robot partners have been designed as autonomous social entities that can feature diverse behavior in the context of this study (i.e., autonomously acting based on a complex algorithm). Humans perceive collaborating robots depending on the physical interaction, actions, shape, and the environment itself. It was shown that such embodied non-humanoid robots are as engaging as humans, eliciting emotional responses in their human partners [3,9].
As evidence shows that emotions critically influence human decision-making and performance [10][11][12], this study sets out to find how a small subset of social cues elicits physiological affect in humans collaborating with robots, in an attempt to investigate how these influence the decision performance on serious game tasks. Humans use the mechanisms from human-human interaction (HHI) to perceive robots as autonomous social agents [3]. These propositions motivate this investigation to take into consideration both HHI and HRI, in the investigation of the elicited physiological affect. This investigation is an extension of the previous study where autonomic non-humanoid robot arms were perceived as social and emotional agents using a small subset of social cues in a serious game [13]. Therefore, this research aims to investigate elicited physiological affect and bring it in relation to the performance on a serious game task.
Traditional physical games require a tangible interaction, contrary to the electronic games which are popular in the contemporary research methods [14]. The physical aspect of serious games is an important factor since humans perceive robots as physical entities which have access to the virtual domain. Therefore, this study uses a traditional game which provides a straightforward measurement of performance, where physical and virtual duality is supported through HRI-enhanced serious games.
This paper attempts to investigate the effects of physiological affect underlying such human-robot proximate collaboration, as proposed by Ijsselsteijn et. al. [15]. More specifically, to bring these effects in relation to the performance on a serious game task in collaborative HRI, by mapping the participants' physiological responses towards the collaborating non-humanoid robot arms on the arousal/valence axes [16]. These insights could provide a deeper understanding of how elicited physiological affect underlies such interaction from the human collaborator perspective, informing the design of more meaningful collaborative serious games that would use the objective measures of physiological affect together with intelligent robot collaborators to potentially increase performance on the shared tasks.
The work presented here builds upon the previous findings, where it has been partly described and published [13]. This previous publication investigated the social abilities of non-humanoid robot arms to be perceived as social agents with emotional abilities, through a number of subjective questionnaires probing social categories and emotions perceived by the human participants. The current manuscript uses objective physiological measures to disseminates the elicited physiological affect findings in human participants in the interaction with their robot partners and how it influences the decision-making performance. The study was a part of the PsyIntEC ECHORD project (FP7-ICT-231143).

Related work
The somatic marker theory claims that decisions are aided by emotions in the form of relevant information contained in the bodily states [17]. Recent discoveries motivated the determination of emotions through a combination of factors [18], which motivated Russell [19] to classify them through a combination of their independent components, arousal and valence. In his model, the level of excitement has been represented by arousal, while valence defined whether the current emotional state is positive or negative.
The evidence further suggests that people are sensitive to the social cues of collaborating robots [4,20] (i.e., gestured motion and speed), where they have been found to elicit emotions in their human partners [21]. The previous investigation used variations in gestured motion and speed of the collaborating robots, from a direct path at a fixed speed to a variable speed in gestured motions [21]. These previous investigations motivate the use of the collaborators in this study to act as a stimulus eliciting emotional responses, bringing them into relation to the performance on a serious game task.

Affect detection
There is a rich body of physiological literature related to the affect detection and its use in the HRI domain [22]. However, only a few studies have used robots as the primary elicitors of affective physiological responses. While the subjective measures (e.g., self-evaluation) always incorporate a potential self-deception, physiological signals are seen as an objective measure of emotional states since they are hard to manipulate intentionally [23]. There is substantial evidence that physiological activities associated with affective states can be differentiated and systematically organized, which would allow for the analysis of their effect on the performance in collaborative serious games [24].
Electrocardiography (ECG) and Galvanic Skin Response (GSR) were employed to measure emotional states of humans in response to robots with more than 80% success rate [25]. Studies have found a strong correlation between ECG and physiological valence [26,27]. There have been multiple findings of a linear correlation between GSR and physiological arousal [28,29]. Even though GSR has been found to indicate physiological valence, the correlations were not always as strong, and greater confidence in interpretations has been obtained by combining GSR with other measures, such as ECG [27].
In psychophysiological experiments, it is easy to neglect the recorded baseline as a valuable source of information about the participants. Not only that there are complexities regarding when to measure baseline values, but also the law of initial values limits the extent of possible changes on physiology that can occur during tasks [30]. Therefore, higher or lower values during baseline would inherently affect physiological response levels in serious games, where subsequent task levels are calculated against these baseline levels. Baseline is the tonic level of activity during a rest period for a specific physiological modality, and it is measured when participants are not responding to a (known) stimuli [31]. This measure is typically taken at the end of a resting period, typically in the last three to five minutes of a ten minute resting period.

Serious games
Serious games are defined as games which are not used for mere entertainment purposes but might be used as tools to collect behavioral data [32,33], as it is the case in this study. They need to captivate and engage players for a specific purpose intended in a 'serious' investigation of certain aspects of human endeavors [34,35]. There are studies investigating the interaction between humans and robots in serious games, but they are sparse. Even though research on interaction in a physical environment is rare, simulated computer agents playing games (e.g., chess or checkers) together with humans are more common [36]. Physical collaborators might support higher motivation and better performance in contrast to the traditional collaboration-based digital serious games with a computer (a virtual entity) [37]. This is especially true for HRI-enhanced serious games present a physical entity eliciting diverse behaviors and stronger emotional responses in participants for the robot collaborators in serious games [38].
In contrast to the traditional collaboration-based digital serious games where one is playing together with a computer (a virtual entity), HRI-enhanced serious games present a physical entity eliciting diverse behaviors and stronger emotional responses in participants [38]. The authors further state that this might support higher motivation, performance, and focus on a task [38,37]. A collaborative serious game task has been used in this study to extend the research in this domain.
There have been a number of previous studies which have used robot collaborators aiding human decisions in serious game settings. Robots were successfully introduced and used in a social interaction context, inside serious games: the 'Tic-tac-toe' serious game where the robot and humans move the game pieces on a physical board [39]; for the treatment of autism [40] and stress [41]; and in games and other natural social interactions with humans conveying emotions and robots providing feedback [42]. The dynamics between performance and physiological affect might deepen our understanding of human collaboration with robots in serious games, through the detection of underlying emotional states [43]. It is only when we perceive those physiological signals in such a way, that we can grasp its implications for the broader scope of understanding elicited physiological affect using social cues in the HRI-enhanced serious games. Peck e al. [44] used a robotic basketball game to learn the preferences of children with autism spectrum disorder, where the anxiety levels were implicitly measured using physiology. The authors reported an increased performance and higher positive valence between the robot collaborator conditions. Kim et al. [45] designed a game of 'Twenty Questions' as an interactive task with the robot system successfully detecting emotions. Tapus  the robot which continuously monitored the player's performance to adjust elicited stress and consequently frustrate players. The authors reported an increased performance for the lower arousal condition.

Hypothesis
The previous investigations have shown that engaging physical nonhumanoid robot collaborators can elicit emotional responses [3], which in turn might influence human decision-making and performance on game tasks [10]. Furthermore, humans perceive the observable social cues of their collaborators through the mechanisms from HHI [4] (i.e., gestured motion and speed [20]). These might have an effect on elicited emotions [3]. Following up on these findings, this study investigated the influence of human and non-humanoid robot arm collaborators on the performance in a serious game. Thus, this study takes into consideration the influence of physiological affect in response to human and such robot collaborators on the performance in serious games. Therefore, the following hypothesis is presented: Hypothesis 1. The collaborator condition (i.e., human, robot) will affect the performance on the game task (H1).
Previous research suggests that robot collaborators elicit physiological emotions in human partners [21,4]. Moreover, a higher motivation, positive emotions, and sufficient arousal are correlated with higher performance in serious games with physical collaborators [38,46,47]. Csikszentmihalyi and Bosse [47] argue that previous findings are valid unless a challenge is sufficiently beyond or below one's abilities, which might generate anxiety or boredom respectively, resulting in lower performance. It has been found that an engaging environment elicits high physiological arousal, while those higher levels of arousal were correlated with a higher performance [48]. These studies warrant further investigation of how the performance on a collaborative task is affected by different dimensions of elicited physiological affect in regard to robot collaborators. To expand on these investigations, the following hypothesis is presented: Elicited physiological arousal will be affected by the collaborator condition (H2a), which in turn will affect the performance on the game task (H2b).
While physical collaborators have been found to elicit a higher motivation on a task, which is correlated with positive emotions [46,47], previous studies have not found a strong correlation between physiological valence and performance on a task [48]. To expand on these findings, the following hypothesis is postulated for the investigation of the interactive effects of elicited physiological valence and its influence on the performance in a serious game: Hypothesis 3. Elicited physiological valence will be affected by the collaborator condition (H3a), which in turn will have no significant effect on the performance on the game task (H3b).

Participants
This study included 70 participants, 58 were males and 12 females. The age of participants ranged between 19 and 31, with a mean of (23.60 ± 2.34). Participants were students of Blekinge Institute of Technology, Sweden. Demographic data (i.e., familiarity with the ToH game task, board games in general, previous possession of a pet, and solving mathematical problems) were collected, and the participants were given a movie ticket as a reward for participating. The experiments were carried out by the Game Systems and Interaction Laboratory (GSIL) at Blekinge Institute of Technology, Karlskrona, Sweden. The Ethical Review Board in Lund, Sweden, has approved all experiments (reference number 2012/737) conducted in this study.

Experimental setup
A crossover study with controlled experiments has been conducted in a laboratory setting. The artificial fixture light was used while the temperature was held constant at 23°C ± 1°C throughout the experiment. The participants were seated in a chair with a fixed height and a predefined position. The height and position were constant during the experiment. The two experimenters were always present in the room while they were completely hidden behind the screen. Safety monitoring was executed through a live video feed from the camera. In addition, an emergency stopping sequence and additional fail-safe software were introduced in the case of possible software failures.

Main manipulations
The Tower of Hanoi (ToH) game was used as the serious game for the study. The aim of the task was to move the disks from the starting to the final configuration. The participants played the turn-taking ToH game together with a non-humanoid robot arms collaborator and human collaborator, with identical setup between the trials and conditions. Four manipulations were introduced in the experiment (see Fig. 1, which were: (a) the Solo condition where the participants were playing the game on their own; (b) the Human Collaborator condition where the participants played together with the human partner emulating the direct robot collaborator condition; (c) the Direct Robot Collaborator condition where the robot was always moving in a similar fashion with a direct path at fixed speed; and (d) the Non-Direct Robot Collaborator condition where the robot had one additional non-direct random point inserted in its path while moving at varying speeds. The Human Collaborator has been trained to interact the same way with every participant according to a well-rehearsed procedure. All of the collaborators were following the algorithm and playing optimally on each move.

Study stimuli
The ToH was originally a single-player game, wherein the context of collaborative gameplay between humans and robots each took turns to complete the game in the pace they feel most comfortable with. Since an optimal solution to the game existed, the robot arms were able to easily handle the game, while it was a reasonable challenge for most humans. Following the previous discussion on serious games as tools to collect behavioral data which engage players in a 'serious' investigation of certain aspects of human endeavors whose purpose is not pure entertainment, the ToH game was considered as a serious game in this scientific exploration of human-robot collaborative interaction. The rules of this mathematical game constitute of three rods and a number of disks of different sizes that can slide onto any rod. The aim of the game is to start from a given configuration of the disks on the leftmost peg and to arrive in a minimal number of moves at the same configuration on the rightmost peg [49]. In the context of this study, all participants started the first move from the beginning configuration with four disks. The individual trials consisted of moving any single disc to a next legal position, interchangeable between the participants and the collaborator until the final configuration of disks was reached on the opposite peg from the start. The participants always started first and the game was linear, which meant that there was always just one possible optimal step to move the disks towards the final configuration. Most of the participants were naïve to the ToH serious game, and it was a reasonable challenge for most of them.
The task was simple to automate since the games including only one human participant where social actions are not required (e.g., chess or checkers), make it possible to create autonomous agents that play optimally [16]. Therefore, there was just one possible optimal step at every move that progressed the disks towards the final configuration. The participants had a choice to take this optimal step as their next move or to take the non-optimal step to move the disks in any other legal position, which would not necessarily lead towards the final configuration. The optimal step was mandatory for all of the collaborators.
The elicitation of physiological affect was achieved through the gestured motions and the speed of the collaborating robot. More specifically, the gestured motions for the Direct Robot Collaborator were composed of a direct path at a fixed speed of 30 cm/s between the two endpoints of a current disc movement. In particular, humans prefer that a robot moves at speed slower than that of a walking human [20]. Furthermore, for the Non-Direct Robot Collaborator the gestured motions were composed from a random path and speed between 5 cm/s up to 70 cm/s, where a random point in the path between the two endpoints of a current disc movement was generated on-line and inserted. The random points were randomized on each game move robot arm makes, which totals in three virtual positions the robot arm has to follow while making its move. The robot was passing through all the specified positions before having arrived at a final disc movement position. The experimenter had been trained according to a well-rehearsed script to interact the same way with every participant, in an optimal fashion at every move. Participants were instructed on the rules and trained through a practice game session with the experimenter until they could finish the simple setup with three disks.

Experiment procedure
A demonstration of the experimental setup is shown in Fig. 2, where a human and the robot are collaborating on the ToH serious game, sharing the same physical space. It shows the experimental setup for human-robot cooperation with all the physiological sensors attached.
Upon the arrival of a participant, the following procedure was employed: 1. After entering the lab room, each participant was seated in a fixed chair at the table and faced the game task at a 60 cm distance. The participants were given written information about the experiment and the description explaining the ToH game. They were also given written information explaining that data were stored confidentially. When the participants agreed to take part in the experiment, they signed an informed consent form. 2. Before starting the experiment session, the participants played a practice ToH game with three disks in order to acquaint them with the game task. 3. Before the experiment started, the participants filled in a demographics questionnaire. 4. The physiological sensors measuring ECG and GSR were attached, even though all of these physiological measurements were not used in the analysis. The participants were asked to relax for four minutes in order to acquire a baseline recording of physiological data. 5. Each participant performed the experiment with the four conditions: solo, human collaborator, direct robot collaborator with the direct gestured movements and non-direct robot collaborator with the nondirect gestured movements. The order of the four conditions was counterbalanced between the participants in order to minimize the ordering effects and explore the difference in elicited physiological affect without the sequential effects. Therefore, sequential learning of the task condition was minimized between the subsequent conditions. Each experimental condition was conducted as follows: (a) The participants played the ToH game.
(b) After a trial was finished, a pause of five minutes was allowed between the trial runs, where the questionnaire was administered. (c) The operator instructed what game task to perform next through the laptop placed next to the participants. They were allowed a rest of one minute before starting with the task trial. The operator controlled the laptop using a remote desktop.
As suggested in the study from [48], the four conditions (solo, human collaborator, direct robot collaborator and non-direct robot collaborator) were presented to each participant. For each of the four conditions a participant repeated each condition three times one after the other (thus in all, a total of 12 ToH games were played per participant). Each experimental session took around 90 min to complete.

Data collection
ECG was measured using two 16-mm Ag/AgCl spot active electrodes in a three-lead unipolar modified chest configuration. The ground electrode was positioned on the left earlobe, while the other two electrodes were positioned on the right collarbone and the lowest rib on the left side of the chest. Therefore, the recorded ECG signal was amplified, band-pass filtered at 10-40 Hz, and 16-bit digitized, before the analysis was performed to detect highly-positive R-spikes (heartbeats) in the signal and calculate consecutive R-R intervals [50]. The interval outliers resulting from artifacts or ectopic myocardial activity were edited and linearly interpolated.
To provide the strongest signal variations, GSR was measured using surface electrodes attached to the palmar surface of the middle phalanges from the middle finger and the index finger of the non-dominant hand (to reduce mechanical pressure susceptibility). The participants washed their hands with water and soap before the electrode placement. The temperature and humidity were held constant across the sessions because of GSR susceptibility to their influence.
Both physiological signals were acquired using Biosemi Active Two 1 physiological data acquisition system and its accompanying ActiView 9 software. The sampling rate was fixed at 2048 Hz for all channels. As described, appropriate amplification and band-pass filtering were performed, and the signal was subsequently down-sampled to 256 Hz upon data reduction.

Data reduction and analysis
The data reduction was performed using Ledalab software for GSR [51], while Kubios software [52] and the HRV Toolkit 2 were used for ECG. Such data were compared across the condition differences (solo, human collaborator, direct robot collaborator and non-direct robot collaborator) and the individual differences for the same trials.
GSR was measured in microsiemens (muS) and analyzed offline. GSR includes short-term phasic responses to specific stimuli, and relatively stable longer-term tonic levels [27]. In continuous stimulus settings, the most common measures of GSR are skin conductance level (SCL) and skin conductance response (SCR), where their changes are thought to reflect general changes in autonomic arousal [28]. The authors stated that the SCR signal is suitable for assessing the intensity of single (phasic) emotions, but changes in the overall (tonic) level are rather inert, thus valid for the trials longer than two minutes, such as the overall session in this experiment. Changes in arousal within periods shorter than two minutes are not likely indicated using the SCL. This problem is particularly limiting in trials shorter than two minutes. When the SCL temporal precision is insufficient, the rapidly reacting phasic changes (NS-SCR) seem to indicate a more promising focus: their number during a given time period is a prominent phasic-based indicator of arousal [53].
The raw GSR signal was down-sampled by a factor of eight (from 2048 Hz to 256 Hz) to remove the high-frequency measurement noise and then smoothed by a 25 ms moving average window. The continuous decomposition analysis was performed. The phasic skin conductance detection algorithm used the following heuristics for the valid peak identification for a particular SCR: the slope of the rise to the peak should have exceeded 0.05 muS/min; the amplitude should have exceeded 0.05 muS; and the rise-to-peak time should have exceeded 0.25 s. The standard threshold was set to a minimum amplitude of 0.05 muS [53,29], because any function of NS-SCR and amplitude remains highly sensitive to the threshold that distinguishes it from noise. Such noise may be a consequence of the instruments' quality, the environment, or interpersonal differences. Once the phasic responses were identified, the rate of responses was determined. All the signal points that were not included in the response constituted the tonic part of the SCL signal. The slope of the tonic activity was obtained using linear regression. Another feature derived from the tonic response was the mean tonic amplitude.
The raw ECG signal was filtered using a high pass filter with a cutoff frequency of 0.1 Hz. The R-peak detection algorithm performed bandpass filtering of the raw ECG signal, and the signal was then smoothed by a 10 ms moving average window. The peaks were then detected in the resulting signal, and the detection heuristic rules were applied to avoid missing R peaks or detecting multiple peaks for a single heartbeat. These rules included obtaining the amplitude threshold (the difference between a peak and the corresponding inflection point) at which a peak is considered a beat: enforcing a minimum interval of 300 ms and maximum interval of 1500 ms between the peaks; checking for both positive and negative slopes in a peak to ensure that the baseline drift is not misclassified as a peak; and backtracking with the reexamination/interpolation when a missing peak was detected. Generally, the average change of a heart rate is expected to range between 2 and 15 bpm [24]. The chosen interval threshold between the peaks was well above the rate of heart rate change due to the genuine heart acceleration. Data were visually inspected for the artifacts which were subsequently corrected. The R-R intervals were extracted. The time-domain features of inter-beat interval, such as the mean and standard deviation, were computed from the detected R peaks. Inter-beat interval variability was explored by performing a power spectral analysis using inter-beat interval data to localize the sympathetic and parasympathetic nervous system activities associated with the different frequency bands. Heart rate variability (HRV) is highly correlated with emotions [54]. Two measures of HRV are the standard deviation of normal-to-normal heartbeat intervals in the time domain (SDNN) and the ratio of low and high-frequency powers (LF/HF) [55]. Wang and Huang [56] stated that SDNN and LF/HF were employed as two dimensions in the physiological valence/arousal model, where evidence revealed that SDNN was a good physiological indicator of valence [57,58]. The total variance of HRV increased with the length of analyzed recordings [59]. Thus, in practice, it was inappropriate to compare the SDNN measures obtained from the recordings of different durations. However, the duration of recordings used to determine the SDNN values (and similarly, the other HRV measures) was standardized to a minimum of 5 min recordings. Generally, the SDNN levels for the participants with positive affect were found to be higher than for negative one [58]. For even shorter recordings, main spectral components of the LF/HF ratio were distinguished in a spectrum calculated for the short-term recordings from 2 to 5 min [60].
Ectopic beats, arrhythmic events, missing data and noise effects could have altered the estimation of HRV; therefore the proper interpolation (or linear regression or similar algorithms) on the preceding/ successive beats on the HRV signal or its auto-correlation function could have reduced this error [61]. The previously described interpolation steps were performed in Kubios software, while the variables of Heart rate, HRV, SDNN, and LF/HF ratio were extracted in the HRV Toolkit.

Hardware and software systems
The hardware system contained two Adept Viper S650 3 6 DOF robot arms with Robotiq Adaptive 2-finger Grippers 4 as the end effectors. The Adept ACE software was used to control the robot arms, to pick up and drop the specified game disk, and to close/release the grippers. The Microsoft Kinect camera was used to track the moves made by the humans and robots during the ToH game. A camera was also used for the surveillance of the participants and the robot arms, in case of an emergency. A single PC running Windows 7 was controlling the system.

Results
Prior to the analysis, both ECG and GSR data were normalized and standardized, while the outliers were removed using z-scores for standardized values of ±3.0 or greater. One-way and two-way ANOVA was used to analyze the differences across groups and conditions, while Pearson product-moment correlation index and Spearman's rank-order correlation were used to analyze correlations in the data. The analysis was performed using SPSS software with the alpha level set at .05. The data showed no violation of normality, linearity, or homoscedasticity.
The participants reported previous experience with robotics on a seven-point Likert scale, where 1 meant 'no experience' and 7 meant 'familiar experience' (µ = 1.7 = 1.095 N = 70). The differences between the participants were analyzed based on the reported values, and the experienced outliers were identified. Therefore, six participants from the experienced outliers group were removed from the analysis to exclude the effects of participants' familiarity on the experience with the robot collaborators, which resulted in the 64 valid data samples. Moreover, 43 of 70 participants have not had any previous experience with the ToH.

Collaborator conditions and performance
All of the collaborator conditions were reported as comparable in performance, as there was no significant difference in the total number of moves to reach the final configuration of disks between the human and robot collaborator condition groups (F(1,569) = 3.705, p > .05), shown in Fig. 3 where a higher value reflects worse performance. In contrast, a significantly higher number of moves in the solo condition was found (F(3,751) = 20.807, p < .001) since the participants were not expected to know the most optimal solution for the ToH game task. This evidence does not land support for the (H1) regarding the investigation to understand the impact of the collaborator conditions on the performance on a game task. Therefore, an alternative hypothesis was preferred that there was no significant difference in the performance between a human and robot collaborator conditions.

Physiological arousal
A series of analyses were performed to understand the impact of elicited physiological arousal on the performance on the game task for each collaborator condition (H2a and H2b). The ToH serious game was found to elicit a high physiological arousal overall (SCL) of 765.209 muS ( = 835.545 muS), normalized against the baseline. Furthermore, there was a significant difference between the collaborator conditions regarding arousal (F(3,744 = 58.881, p < .001), as shown in Fig. 4 and detailed in Table 1. For simplicity, both robot collaborator condition groups were merged into a single data sample and compared against the human collaborator. A Tukey post hoc test revealed that the physiological arousal indicator NS-SCR was significantly higher for the non-direct robot collaborator (19.59 ± 8.05, p =.03) compared to the direct robot collaborator (17.43 ± 6.76) condition. Moreover, both robot collaborator conditions were significantly higher (18.51 ± 7.49, p < .001) than the solo (11.62 ± 9.11) and the human collaborator condition (11.01 ± 6.08). The evidence suggests that robot collaborators elicited a higher physiological arousal compared to the human collaborator, lending support to H2a. Moreover, the non-direct robot collaborator elicited a higher arousal than the direct one, but it made no influence on the performance.
Interestingly enough, a significant negative correlation was found between physiological arousal indicator NS-SCR values during baseline and the performance (r s (61) = −.286, p =.023). The evidence suggests that the participants who were more aroused during baseline measurement prior to the task performed better at it. Furthermore, there was a significant difference between physiological arousal (SCL) during baseline and pet possession (F(2,61) = 13.47, p < .001). Where the physiological arousal at baseline prior to the task was significantly higher for the participants that owned a pet 7074.54 muS ( = 2797.307 muS) compared to the ones that have not 4966.337 muS ( = 1931.829 muS).
The worse performing participants reported higher arousal values after each round, as a significant positive correlation was found between the number of moves per round and the physiological arousal indicator NS-SCR values (r =.179, N = 739, p < .001). Considering both this evidence and ones from the previous paragraph, it seems that there is a difference within the elicited physiological arousal that showed an effect on the performance on the game task, while there might be a more complex effect regarding different collaborator conditions.
To investigate this problem, a significant interaction was identified between the effects of the collaborator conditions and physiological arousal on the number of moves (performance) on the game task (F(2,   Fig. 3. Average number of moves per trial in the ToH serious game for each collaborator condition with 95% confidence interval. Stars ( * * * ) indicate a significant difference between the Solo condition in contrast to any of the collaborator conditions, at the p < .001 probability level. Fig. 4. The average physiological arousal values measured by NS-SCR variable in the ToH serious game for each collaborator condition with 95% confidence interval. A significant difference (p < .001) is observable for both robot collaborators in comparison to the solo and human collaborator conditions, where the physiological arousal indicator NS-SCR was statistically significantly higher (p =.017) for the non-direct robot collaborator compared to the direct robot collaborator condition. 556) = 8.902, p < .001). The participants were grouped based on lower and higher physiological arousal elicited. The simple main effects analysis showed that the collaborator conditions significantly affected the performance when physiological arousal was lower (p < .001), with better performance associated with both robot collaborators compared to the human collaborator. Between the robot collaborators, better performance was associated with the non-direct robot collaborator (p < .001). The presented evidence landed support that there was a difference within the elicited physiological arousal for different collaborator conditions that showed an effect on the performance on the game task (H2b). Furthermore, a significant interaction was identified between the effects of the collaborator conditions and their order of presentation on the physiological arousal (SCL) (F(1, 396) = 3.815, p =.05), as well as physiological valence (SDNN) (F(1, 405) = 22.509, p < .001). The simple main effects analysis showed that the order of the robot collaborator conditions significantly affected both physiological values for the robot collaborators (p < .001), with lower valence and higher arousal associated with first administered direct robot collaborator following with non-direct robot collaborator order, compared to the higher valence and lower arousal associated with it in the reversed order, as shown in Fig. 5 (a) and (b) respectively.

Physiological valence
A series of analyses were conducted to understand the impact of elicited physiological valence on the performance on the serious game task for each collaborator condition (H3a and H3b). The ToH serious game was found to elicit a high (positive) physiological valence overall (SDNN) of.624 s ( = 1.07 s), normalized against the baseline. Furthermore, there was a statistically significant difference between the collaborator conditions regarding valence (F(3,732) = 3.575, p =.014), as shown in Fig. 6 and detailed in Table 1. A Tukey post hoc test revealed that the valence indicator LF/HF was significantly higher for the non-direct robot collaborator (3.081 ± 1.869, p =.008) compared to the human collaborator (2.453 ± 1.724). Both robot collaborator conditions were significantly higher (2.967 ± 1.841, p =.007) than the human collaborator condition (2.453 ± 1.724). There were no statistically significant differences between the solo and the direct robot collaborator (p > .05). The evidence suggests that the participants seemed to elicit a higher (positive) physiological valence for the non-direct robot collaborator, compared to the human collaborator condition and the direct robot collaborator. These findings suggest that elicited physiological valence was affected by the collaborator conditions, lending support for the H3a.
Overall in the experiment, the participants were found to perform the task equally well regardless of the physiological valence found, as no significant correlation was found between the valence indicator LF/ HF and the number of moves (p > .05). These results provided support for the H3b and indicated that while there were differences across the collaborator conditions regarding the elicited physiological valence, this had no effect on the performance.

Limitation
The collaborative task between humans and robots in the context of serious games might have limited the generalizability. Therefore, future studies should investigate further towards any collaborative task context between humans and robots. The participants' cohort in this study consisted entirely of college undergraduates and graduates. Therefore, future research should deepen the understanding of different populations interacting with the robot collaborators, as social robots get integrated into various aspects of life. The time on the task for each participant was around 30 min, with 90 min total for the whole experiment. Even though such time was comparable with other similar studies reported, it might have resulted in the fatigue effects in this experiment. Nevertheless, the effect was assumed to be minimal, as results gave evidence of an increased performance during the later trials of the game task for each participant.   . 6. The valence indicator LF/HF was statistically significantly higher (p =.008) for the non-direct robot collaborator compared to the human collaborator condition. While both robot collaborator conditions were statistically significantly higher (p < .001) than the solo condition and the human collaborator, there was no statistically significant difference (p > .05) between the solo and the direct robot collaborator condition.

Discussion
This study investigated physiological affect and the performance of participants in a proximate interaction with the human and non-humanoid robot arm collaborators on the serious game task. Furthermore, it examined the effects of elicited physiological arousal and valence on the human-robot collaborative performance on the game task. This investigation was based on the social cues of the collaborating robots (i.e., gestured movement and speed).
The results show that the collaboration with non-humanoid robot partners might be as effective as a collaboration with human ones, where the worst performing individuals were found in the solo condition. These findings did not lend support to H1, as there was no significant difference in participants' performance between any of the collaborator conditions. Motivation could be that, collaboration with physical entities eliciting diverse behaviors and strong emotional responses might have promoted a higher focus on the game task at hand [38,46,47].
The results further show that both robot collaborators elicited higher physiological arousal than the human one, with the non-direct robot collaborator being significantly higher than the direct robot collaborator. These results support previous investigations on social cues in the context of HRI, where the participants were found to be sensitive to the robots' social cues regarding physiological arousal in the context of collaborative serious games [20,4]. The results also indicate that high physiological arousal is correlated with worse performance in the context of collaborative serious games, which is motivated by the previous investigations on the connection between arousal and performance on a task [62,47]. In their investigations, the authors state that the performance is positively correlated with physiological arousal up to the point when the level of arousal becomes too high, and the performance decreases. This notion motivated the investigation that there might have been high physiological arousal elicited in this study. Results show that the 'lower' physiological arousal group showed a significant effect of collaborator conditions on the performance, while the 'higher' physiological arousal group had no such effect. Taking these findings into consideration, it seems that the performance on the collaborative serious game task is affected by both collaborating partners and elicited physiological arousal, landing support for both H2a and H2b. These findings also indicate that serious games might elicit high physiological arousal, which may have disrupting effects on the performance on the game task, invalidating all the potential benefits of collaborating partners.
On another note, better performance was found in the participants who had higher physiological arousal during baseline. Taking the resting nature of baseline measurements, this motivates the notion that these individuals might have been already aroused and motivated to participate in the study, which might have placed them initially at the more optimal position of the arousal-performance bell-shaped curve [62,47], where some amount of arousal is needed for optimal performance on the task. These finding may be further supported by the notion that the individuals who previously owned a pet were found to have higher physiological arousal at baseline, compared to the ones that have not. The autonomous nature of pets might have contributed to the higher arousal and excitement of these individuals for the experiments with the robot partners.
The results show that the non-direct robot collaborator elicited higher (positive) physiological valence compared to the human collaborator on the serious game task. These findings indicate that people are sensitive to robots' social cues regarding physiological valence in the context of collaborative serious games. Nevertheless, physiological arousal was found to have a more profound effect on the performance in the context of collaborative serious games, compared to physiological valence, as the participants performed equally well regardless of the elicited physiological valence. These findings are supported by the previous studies exploring characteristics of robot behavior in HRI [20,4]. These findings lend support for both H3a and H3b, where collaborative partners influenced the valence on the task, but without affecting the performance. On another hand, for the 'lower' arousal participants, better performance was associated with the non-direct robot collaborator, which in turn elicited higher (positive) physiological valence.
Results further showed ordering effects where the presentation of collaborator conditions significantly affected elicited physiological affect. The first robot collaborator presented in the experiment was associated with lower valence and higher arousal compared to the second robot collaborator presented. The participants positively experienced the robot collaborators after they have been already introduced to them for the first time, as well as a relaxing experience. This finding is further motivated by the report on the previous experience with robots where most of the participants were not familiar with them, and this might have been the first time they collaborated with one.

Conclusion
Contrary to the standard Wizard-of-Oz approach to studies regarding robots and users [63], this study was conducted in a realistic setting without an obvious presence of the experimenter. The contributions include the advances in both theoretical and practical understanding of physiological affect in the context of HRI. As well as, the design of a serious game using non-humanoid robot arm collaborators which elicited physiological affect essential to such context. The motivation for this investigation lies in a direct link between the embodiment of physiological affect and information provided through the social cues of collaborating partners. This link is dependent on physical interaction and a serious game environment itself [8]. Therefore, this research outlines a method for the objective measurement of the physiological affect in collaborative HRI, applied in the context of serious games. Moreover, it supports the notion that even non-humanoid robot collaborators can display social cues and elicit physiological affect in their human partners.
Overall, the collaborators in this study created a physiologically arousing and highly (positive) valenced serious game environment. Regarding the H1, the findings indicate that the participants' performance on the serious game task is comparable between the human and robot collaborator conditions, as the number of moves was consistent across all the collaborator conditions. Therefore, the collaborator condition (i.e., human, robot) may not have affected the performance on the collaborative task in a serious game. Regarding the H2 and H3, this study found evidence of higher physiological affect elicited with the robot collaborators (arousal and valence) in contrast to their human collaborator counterpart. Therefore, elicited physiological arousal and valence may have been affected by the collaborator conditions between humans and robots on the collaborative task in a serious game. Nevertheless, while arousal may affect the performance on such game task, valence may not have such a significant effect. These findings motivate the introduction of autonomous robots as partners in the context of collaborative serious games, where the same performance benefits may be achieved as with using the human ones, which would motivate their introduction as partners on the task. The non-direct robot collaborator condition elicited higher physiological arousal and a (positive) valence, compared to the human collaborator. Moreover, it elicited a higher physiological arousal than the direct robot collaborator condition, indicating that the careful design of robot partners might leverage different social cues to elicit target physiological arousal in the context of collaborative serious games. Furthermore, such context may witness a more positive valence elicited when using non-direct robot partners instead of human ones. This may be important as robots get introduced to different aspects of human lives, possibly as team members [3]. Nevertheless, one has to be careful as to which robot partners get introduced to naïve participants, as ordering effects showed lower arousal and more positive valence for the robot collaborator which was introduced after the initial robot condition. It is important to note that people might elicit lower arousal and positive valence for their robot partners after they have been introduced to them once. This might become a prevalent issue as robot partners become pervasive in society. Furthermore, the results also showed that people who owned a pet were more positive and excited to collaborate with robots on the same task. Such individuals who were initially starting the task with a certain amount of arousal performed better on a decision-making task in serious games.
The current study supports the notion that understanding physiological affect underlying such collaborative HRI from the human perspective, it would be possible to design more personalized serious games with intelligent robots which act together with human partners eliciting relevant physiological affect [64]. This may contribute to improving the quality of HRI informing the design of such collaborative serious games. On the other hand, one has to be careful when designing serious games which elicit high physiological arousal, as such high levels of physiological arousal may be correlated with lower performance [47,62]. In contrast, physiological valence may not have such a significant effect. The results showed that more aroused individuals prior to the task, performed better on the task, giving further evidence for the introduction of robot partners on the task, at least at this novelty stage. Nevertheless, valence proved to be a more complex issue, where individuals who had optimal 'lower' arousal on the task were able to benefit from higher 'positive' valence and increase their performance. If one considers designing serious game environments that elicit lower physiological arousal using robot instead of human collaborators, than one might witness an increase in the performance on a serious game task. As this study found evidence that the better performance was associated with the robot collaborators compared to the human ones, only for the 'lower' physiological arousal group which was the only one showing the statistically significant effect of the collaborator conditions on the performance. Finally, if one considers that robots possess the physical-virtual duality and have access to a game task information, one can clearly see why a choice of robot collaborators in serious games would be a sound choice.
Taking a step forward with using the physiological measurements, recognition of affective states is expected by cooperating humans, which may allow them to be aware of their emotions through the presentation of sufficient feedback [65]. Future studies should investigate the recognition of participants' emotions on-line using physiological measurements to adapt the robots' behavior in a closed-loop social interaction [66], as the embodiment is a powerful concept in the development of the adaptive autonomous systems [67].

Declaration of Competing Interest
The authors declared that there is no conflict of interest.