Investigating the effect of the reality gap on the human psychophysiological state in the context of human-swarm interaction

The reality gap is the discrepancy between simulation and reality—the same behavioural algorithm results in different robot swarm behaviours in simulation and in reality (with real robots). In this paper, we study the effect of the reality gap on the psychophysiological reactions of humans interacting with a robot swarm. We compare the psychophysiological reactions of 28 participants interacting with a simulated robot swarm and with a real (non simulated) robot swarm. Our results show that a real robot swarm provokes stronger reactions in our participants than a simulated robot swarm. We also investigate how to mitigate the effect of the reality gap (i.e., how to diminish the difference in the psychophysiological reactions between reality and simulation) by comparing psychophysiological reactions in simulation displayed on a computer screen and psychophysiological reactions in simulation displayed in virtual reality. Our results show that our participants tend to have stronger psychophysiological reactions in simulation displayed in virtual reality (suggesting a potential way of diminishing the effect of the reality gap). ABSTRACT 9 The reality gap is the discrepancy between simulation and reality—the same behavioural algorithm results in different robot swarm behaviours in simulation and in reality (with real robots). In this paper, we study the effect of the reality gap on the psychophysiological reactions of humans interacting with a robot swarm. We compare the psychophysiological reactions of 28 participants interacting with a simulated robot swarm and with a real (non simulated) robot swarm. Our results show that a real robot swarm provokes stronger reactions in our participants than a simulated robot swarm. We also investigate how to mitigate the effect of the reality gap (i.e., how to diminish the difference in the psychophysiological reactions between reality and simulation) by comparing psychophysiological reactions in simulation displayed on a computer screen and psychophysiological reactions in simulation displayed in virtual reality. Our results show that our participants tend to have stronger psychophysiological reactions in simulation displayed in virtual reality (suggesting a potential way of diminishing the effect of the reality gap). The results of the Friedman test on the psychophysiological data does not show any main effect of 338 the reality gap on our participants’ heart rate ( χ 2 ( 2 ) = 0 . 78, p = . 67). The results show, however, a main 339 effect of the reality gap on our participants’ skin conductance level ( χ 2 ( 2 ) = 15 . 2, p < ﬁeld human-swarm is little in social 2015). Only the self-reported arousal show that our participants had stronger reactions during simulation in virtual reality than during simulation on a computer screen. With these results, we can not strongly conﬁrm our second hypothesis. However, the results of the skin conductance and the self-reported valence, combined with the signiﬁcant results of the arousal, both show a trend of our participants to have stronger psychophysiological responses in virtual reality than in front of a computer screen.


12
In a near future, swarms of autonomous robots are likely to be part of our daily life. Whether swarms 13 of robots will be used for high-risk tasks (e.g., search and rescue, demining) or for laborious tasks (e.g., 14 harvesting, environment cleaning, grass mowing) (Dorigo et al., 2014(Dorigo et al., , 2013, it will be vital for humans 15 to interact with these robot swarms (e.g., supervise, issue commands or receive feedback). 16 Recently, human-swarm interaction has become an active field of research. More and more, re-17 searchers in human-swarm interaction validate their work by performing user studies (i.e., group of 18 human participants performing an experiment of human-swarm interaction). However, a large majority 19 of the existing user studies are performed exclusively in simulation, with human operators interacting 20 with simulated robots on a computer screen, e.g., Bashyal   Simulation is a convenient choice for swarm roboticists, as it allows experimental conditions to be 24 replicated perfectly in different experimental runs. Even more importantly, gathering enough real robots 25 to make a meaningful swarm is often prohibitively expensive in terms of both money and time. However, 26 conducting user studies in simulation suffers from a potentially fundamental problem-the inherent 27 discrepancy between simulation and the reality (henceforth referred to as the reality gap). 28 In this paper, we study the effect of the reality gap on human psychology. Understanding the 29 psychological impact of any interactive system (be it human-computer interaction, human-robot interaction 30 or human-swarm interaction) on its human operator is clearly essential to the development of an effective 31 interactive system (Carroll, 1997). To date, it is not yet clear what the effect of the reality gap is on human 32 psychology in human-swarm interaction studies. Our goal is to study this effect. 33 We present an experiment in which humans interact with a simulated robot swarm displayed on Manuscript to be reviewed Computer Science headset) and with a real (i.e., non simulated) robot swarm (see Fig. 1). In our experimental setup, our goal 36 was to produce results that were as objective as possible. To this end, we firstly recorded psychological 37 impact using psychophysiological measures (e.g., heart-rate, skin conductance), which are considered 38 more objective than purely questionnaire-based methods (Bethel et al., 2007). Secondly, we made purely 39 passive the interaction of our human operators with the robot swarm. In this purely passive interaction, 40 our participants do not issue any commands to, nor receive any feedback from the robot swarm. Finally, 41 we decided that our participants would interact with a robot swarm executing a simple random walk 42 behaviour (compared to a more complex foraging behaviour, for instance). These two choices allow us to 43 isolate the reality gap effect. The passive interaction reduces the risk that psychophysiological reactions 44 to the interaction interface (e.g., joystick, keyboard, voice commands) would be the strongest measurable 45 reaction, drowning out the difference in reaction to the reality gap. The choice of a simple random walk 46 behaviour reduces the risk that any psychophysiological reactions are caused by reactions to artefacts of a 47 complex swarm robotics behaviour. Our results show that our participants have stronger psychophysiological reactions when they interact 49 with a real robot swarm than when they interact with a simulated robot swarm (either displayed on a 50 computer screen or in a virtual reality headset). Our results also show that our participants reported a 51 stronger level of psychological arousal when they interacted with a robot swarm simulated in a virtual 52 reality headset than when they interacted with a robot swarm simulated on a computer screen (suggesting 53 that virtual reality is a technology that could potentially mitigate the effect of the reality gap in human-54 swarm interaction user studies). We believe the results we present here should have a significant impact 55 on best-practices for the future human-swarm interaction design and test methodologies.

57
Human-swarm interaction, the field of research that studies how human beings can interact with swarm 58 of autonomous robots, is getting more and more attention. Some research focuses on more technical 59 aspects, such as control methods (direct control of the robots, or indirect control of the robots) (Kolling    operator to send data (e.g., commands) to and receive data (e.g., feedback) from each individual robot. A 121 second reason for the difference is that there is no social interaction between human beings and robot 122 swarms.

123
In this paper, we study the differences in psychological reactions when a human being passively 124 interacts with a real robot swarm, with a simulated robot swarm displayed in a virtual reality environment, 125 and with a simulated robot swarm displayed on a computer screen. Moreover, while all of the aforemen-126 tioned social robotic works only use dedicated psychological questionnaires to study the participants' 127 psychological reactions, we use a combination of psychological questionnaire and physiological measures 128 in order to study the psychophysiological reactions of participants interacting with a robot swarm. performed in simulation. We believe that conducting a human-swarm interaction experiment in simulation 133 can lead to different results than if the same experiment was conducted with real robots. A reason for 134 the results to be different in simulation and in reality is the inherent presence of the reality gap. It is 135 not always possible, however, to perform a human-swarm interaction with real robots (e.g., because 136 an experiment requires a large number of robots). It is our vision that the effects of the reality gap in 137 simulation should be mitigated as much as possible. In order to mitigate the effects of the reality gap, we 138 propose to use virtual reality for simulating the robot swarm. We based the experiment of this paper on 139 these two hypotheses:

140
• The psychophysiological reactions of humans are stronger when they interact with a real robot 141 swarm than when they interact with a simulated robot swarm.

Computer Science
Measures 165 We used two types of measures: self-reported measures and psychophysiological measures. We use 166 self-reported measures (i.e., data gathered from our participants using a dedicated psychological ques-167 tionnaire) to determine whether our participants are subjectively conscious of their psychophysiological 168 reaction changes and whether these reaction changes are positive (i.e., our participants report to have 169 a positive experience) or negative (i.e., our participants report to have a negative experience). We use 170 psychophysiological measures, on the other hand, to determine objectively the psychological state of 171 our participants based on physiological responses. These psychophysiological measures are considered 172 objective because it is difficult for humans to intentionally manipulate their physiological responses 173 (for instance to intentionally decrease heart rate). In the following two sections, we first present the 174 self-reported measures used in this study. Then, we present the psychophysiological measures. In this study, we collect our participants' self-reported affective state. We measure our participants'  (Mehrabian, 1996) felt during an evaluation. 182 We developed an open source electronic version of the Self-Assessment Manikin (SAM) question-183 naire (Lang, 1980). This electronic version of the SAM questionnaire runs on a tablet device.The SAM   The electrodermal activity (i.e., the skin's electrical activity) and the cardiovascular activity are two 204 common physiological activities used in the literature to study the autonomic nervous system. In this 205 research, we study our participants' electrodermal activity by monitoring their skin conductance level 206 (SCL) and we study our participants' cardiovascular activity by monitoring their heart rate.

207
The SCL is a slow variation of the skin conductance over time and is measured in microsiemens 208 (µS). An increase of the SCL is only due to an increase of the sympathetic nervous system activity. It is, 209 therefore, a measure of choice to study the human fight-or-flight response. SCL has also been correlated 210 to the affective state's arousal (Boucsein, 2012). The heart rate is the number of heart beats per unit of 211 time. It is usually measured in beats per minute (BPM). Unlike the SCL though, variation of the heart rate 212 can not be unequivocally associated with a variation of the sympathetic nervous system only. Heart rate 213 can vary due to either a variation of the sympathetic nervous system, a variation of the parasympathetic 214 nervous system, or a combination of both (Cacioppo et al., 2007). Heart rate activity is, therefore, more 215 difficult to analyse and interpret than the SCL.  responses between our participants, we first recorded our participants' physiological responses at rest (i.e., 219 the baseline), then we recorded our participants' physiological responses during the experiment. In our 220 statistical analyses, we use the difference between our participants' physiological responses at rest and 221 during the experiment.

223
Physiological response acquisition 224 We monitored our participants' physiological responses with a PowerLab 26T (ADInstruments) data 225 acquisition system augmented with a GSR Amp device. The PowerLab 26T was connected via USB to a 226 laptop computer running Mac OSX Yosemite. We used the software LabChart 8 to record the physiological 227 responses acquired by the PowerLab 26T data acquisition system. We used an infrared photoelectric sensor 228 (i.e., a photopletismograph) to measure the blood volume pulse (BVP) of our participants (i.e., changes in 229 the pulsatile blood flow). The blood volume pulse can be retrieved from the photopletismograph from 230 the peripheral parts of the human body such as on the fingers. We can compute the heart rate from the 231 blood volume pulse. Firstly, we calculate the inter-beat interval (i.e., time in seconds between two peaks 232 in the blood volume pulse). Then, we calculate the heart rate by dividing 60 by the inter-beat interval. For    When an experiment starts, the 20 robots perform a random walk with obstacle avoidance behaviour for a 252 period of 60 s. Each robot executes the two following steps: i) it drives straight with a constant velocity of 253 10 cm/s, and ii) it changes its direction when it encounters either a robot or an obstacle in the direction of 254 movement (i.e., it turns in place until the obstacle is no longer detected in the front part of its chassis).

256
The platform used in this study is the wheeled e-puck robot (see Fig. 5) equipped with an extension board.

Manuscript to be reviewed
Computer Science interview with her. During the interview, we explained to the participant the goal of the study. Then, 299 we answered our participant's questions. We finished the experiment by thanking the participant and by 300 giving the participant the 7 C incentive. The entire experiment's duration was 30 minutes per participant.

302
Out of the 28 participants who took part to the experiment, we had to remove the physiological data 303 (i.e., heart rate and skin conductance) of 5 participants due to sensor misplacement. We, however, kept 304 the self-reported data (i.e., valence and arousal values reported by the SAM questionnaire) of these 5 305 participants. In the following of this section, therefore, we analyse the psychophysiological data of 23 306 participants (15 female and 8 male) and the self-reported data of 28 participants (17 female and 11 male). 307 We analysed our data with the R software (R Core Team, 2015) by performing a repeated measures design 308 analysis.

309
Because the data was not normally distributed, we did not use the repeated measure ANOVA test (as 310 the test assumes a normal distribution). Rather, we used a non-parametric Friedman test to analyse both 311 the psychophysiological data and the self-reported data (i.e., the SAM questionnaire). The Friedman 312 test is a rank-based test that does not make any assumption on the distribution of the data. In our case,  In order to determine which sessions differ in their median, we proceeded with a pairwise comparison 319 of the three sessions with a Wilcoxon rank-signed test. The Wilcoxon rank-signed test's null hypothesis 320 states that the median difference between the paired values from two sessions (i.e., a value from one 321 session paired to a value from another session) is equal to zero. The alternative hypothesis states that 322 the median difference of the paired values are not equal to zero. When the Wilcoxon rank-signed test is 323 significant, we can reject the null hypothesis in favour of the alternative hypothesis, and conclude that 324 there is a significant difference between the two sessions.

325
Performing multiple pairwise comparisons (there are three pairwise comparisons between our three 326 sessions) introduces the risk of increasing the Type I error, i.e., to declare the test significant while it is 327 not. In order to control the Type I error, we apply a Bonferroni-Holm correction to the p-values obtained 328 by the Wilcoxon rank-signed test.

329
In addition to determining the effect of the reality gap on our participants, we also determined 330 whether psychophysiological data and self-reported data were correlated (e.g., whether skin conductance 331 is correlated with arousal, or whether arousal and valence are correlated). In order to determine this 332 correlation, we performed a Spearman's rank-order correlation test.

333
In Table 1, we summarise the results of the psychophysiological and self-reported data (i.e., median 334 and Friedman's mean rank of heart rate, SCL, arousal and valence) in each session (i.e., Real Robots,  Table 1. Descriptive statistics of the psychophysiological data and of the self-reported data. We report the median and the Friedman's mean rank (in parentheses) of the three sessions (Real Robots, Virtual Reality, Screen Simulation). We also report the inference statistics of the Friedman test (i.e., χ 2 and p).

356
In addition to studying the effect of the reality gap on our participants, we investigated whether or not 357 some of the dependent variables (i.e., heart rate, skin conductance, arousal and valence) were pair-wise 358 correlated. In order to calculate a correlation between psychophysiological data and self-reported data 359 (e.g., correlation between skin conductance and arousal) we only took into account the self-reported data 360 of the participants whose psychophysiological data had not been rejected (due to sensor misplacement).

361
For the correlation test between arousal and valence we took the 28 participant data points. We did not find 362 any correlation within each of the three sessions (i.e., there was no correlation for any pair-wise dependent 363 variable within the Real Robots session nor the Virtual Reality session nor the Screen Simulation session). 364 We, therefore, investigated whether there was some correlations when the data of each condition was 365 pooled together (e.g., we aggregated skin conductance values from the three sessions). Regarding 366 correlation between psychophysiological data and self-reported data, we found a correlation between 367 skin conductance and valence (ρ = .42, p < .001) and a weak correlation between skin conductance and 368 arousal (ρ = .253, p = .03). There was no correlation between heart rate and valence and between heart 369 rate and arousal. Concerning the self-reported data, we found a correlation between arousal and valence 370 (ρ = .32, p = .002). We did not find any correlation between heart rate and skin conductance.

371
Finally, we also studied the gender effect (i.e., whether females and males differ in their results) and 372 the session order effect (i.e., whether the participants become habituated to the experiment). We analysed 373 the gender effect by splitting into two groups the males' and females' results of each dependent variable 374 (i.e., heart rate, skin conductance, arousal and valence) for each condition (i.e., Screen Simulation, Virtual

375
Reality, Real Robots). We compared these two groups with a Wilcoxon rank-sum test-the equivalent 376 test of the Wilcoxon rank-signed test for independent groups. We did not find any statistically significant 377 difference between males and females in any condition, for any dependent variable. We studied the 378 session order effect as follows. For each condition and for each dependent variable, we separated into 379 three groups the results of the participants who encountered the session first, second or third, respectively. 380 We compared the three groups with a Kruskall-Wallis test-a non-parametric test similar to a Friedman 381 test but for independent groups. We did not find any statistically significant difference among the three 382 groups in any session, for any dependent variable, suggesting that the session order had no significant 383 effect on our participants.

385
In this paper, we presented a study on the effect of the reality gap on the psychophysiological reactions 386 of humans interacting with a robot swarm. We had two hypotheses. The first hypothesis stated that 387 humans interacting with a real (i.e., non simulated) robot swarm have stronger psychophysiological 388 reactions than if they were interacting with a simulated robot swarm. The second hypothesis stated that 389 humans interacting with a simulated robot swarm displayed in a virtual reality environment have stronger 390 psychophysiological reactions than if they were interacting with a simulated robot swarm displayed on a 391 computer screen.

392
Both the self-reported data (i.e., arousal and valence) and the psychophysiological data (i.e., skin 393 conductance) show that the reality gap has an effect on the human psychophysiological reactions. Our 394 participants had stronger psychophysiological reactions when they were confronted to a real swarm of 395 robots than when they were confronted to a simulated robot swarm (in virtual reality and on a computer 396 screen). These results confirm our first hypothesis.

397
Of course, it is not always possible for researchers to conduct a human-swarm interaction study 398 with real robots, essentially because real robots are still very expensive for a research lab and real 399 robot experiments are time consuming. It is, therefore, not realistic to expect human-swarm interaction 400 researchers to conduct human-swarm interaction experiments with dozens or hundreds of real robots. For 401 this reason, we decided to investigate the possibility of using virtual reality in order to mitigate the effect 402 of the reality gap. To the best of our knowledge, virtual reality has yet never been used in the research 403 field of human-swarm interaction and is little studied in social robotics (Li, 2015). Only the self-reported 404 arousal show that our participants had stronger reactions during simulation in virtual reality than during 405 simulation on a computer screen. With these results, we can not strongly confirm our second hypothesis. Manuscript to be reviewed

Computer Science
In this paper, we designed our experiment based on a purely passive interaction scenario. In a passive 410 interaction scenario, human operators do not issue commands to a robot swarm. We motivated our choice 411 of a passive interaction by the fact that an active interaction could influence the human psychophysiological 412 state (making it difficult to separate the effect of the active interaction and the effect of the reality gap on 413 our participants' psychophysiological state). However, now that we have shown the effect of the reality 414 gap in a purely passive interaction scenario, future work should focus on this effect in an active interaction 415 scenario in which human operators do issue commands to a robot swarm. For instance, we could use the 416 results presented in this paper as a baseline and compare them with those of an active interaction scenario 417 in which human operators have to guide a swarm in an environment.

418
In human-swarm interaction, as for any interactive system, it is fundamental to understand the 419 psychological impact of the system on a human operator. To date, in human-swarm interaction research, 420 such understanding is very limited, and worse is often based purely on the study of simulated systems.

421
In this study, we showed that performing a human-swarm interaction study with real robots, compared 422 to simulated robots, significantly changes how humans psychophysiologically react. We, therefore, 423 recommend to use as much as possible real robots for human-swarm interaction research. We also showed 424 that in simulation, a swarm displayed in virtual reality tends to provoke stronger responses than a swarm 425 displayed on a computer screen. These results, therefore, tend to show that if it is not possible for a 426 researcher to use real robots, virtual reality is a better choice than simulation on a computer screen.

427
Even though more research should focus on this statement, we encourage researchers in human-swarm 428 interaction to consider using virtual reality when it is not possible to use a swarm of real robots.