Human perceptions of social robot deception behaviors: an exploratory analysis

Introduction Robots are being introduced into increasingly social environments. As these robots become more ingrained in social spaces, they will have to abide by the social norms that guide human interactions. At times, however, robots will violate norms and perhaps even deceive their human interaction partners. This study provides some of the first evidence for how people perceive and evaluate robot deception, especially three types of deception behaviors theorized in the technology ethics literature: External state deception (cues that intentionally misrepresent or omit details from the external world: e.g., lying), Hidden state deception (cues designed to conceal or obscure the presence of a capacity or internal state the robot possesses), and Superficial state deception (cues that suggest a robot has some capacity or internal state that it lacks). Methods Participants (N = 498) were assigned to read one of three vignettes, each corresponding to one of the deceptive behavior types. Participants provided responses to qualitative and quantitative measures, which examined to what degree people approved of the behaviors, perceived them to be deceptive, found them to be justified, and believed that other agents were involved in the robots’ deceptive behavior. Results Participants rated hidden state deception as the most deceptive and approved of it the least among the three deception types. They considered external state and superficial state deception behaviors to be comparably deceptive; but while external state deception was generally approved, superficial state deception was not. Participants in the hidden state condition often implicated agents other than the robot in the deception. Conclusion This study provides some of the first evidence for how people perceive and evaluate the deceptiveness of robot deception behavior types. This study found that people people distinguish among the three types of deception behaviors and see them as differently deceptive and approve of them differently. They also see at least the hidden state deception as stemming more from the designers than the robot itself.


Education Level
Ten participants (N = 10, 2%) reported that their highest level of education completed was some high school, 66 participants (13.3%) reported that their highest level of education was a high school (or equivalent) diploma, 102 (20.5%) reported having gone to college without receiving a degree, 57 (11.4%) reported having obtained their associates degree, 192 (38.6%) reported having obtained their bachelor's degree, 57 (11.4%) reported having received their master's degree, and 14 (2.7%) reported having obtained a doctoral degree.Four hundred and eighty-one participants (N = 481, 96.6%) reported that English was their native language, with the remaining 17 participants (3.4%) reporting that English was not their native language, with a mean of 27.8 (SD = 8.6 years) years of English language speaking experience across these 17 participants.

MAIN MANUSCRIPT: CODING PROCEDURES
The following section provides additional details regarding hoe we created the procedures used to code participants' open-ended responses to the qualitative measures described in section 2.2.2 of the main manuscript.The following text reported below also supplements section 3.1 of the main manuscript.
Three researchers (one senior, two junior) independently coded each participant's responses to each of the three qualitative free response questions, following a code book that specified proper coding guidelines for each of the question types as well as themes in responses that coders were instructed to look for.These themes were identified across two rounds of pilot testing.Participants in the first round of pilot testing (N = 345) were given the vignettes and asked to provide free responses to the same qualitative questions regarding the robots behaviors in the study.The most common response themes were identified and used as a foundation to generate the code book to replicate whether these themes were similarly present in a second round of pilot testing.Using the code book as a guide, a second round of pilot testing was conducted (N = 63) to evaluate the presence of these themes in responses in a new sample.We then used this code book to complete coding for the experimental sample reported in the current study.The full code book is provided in section 5 below.
Once each coder completed their evaluation of all the participants' responses, the senior and junior coders' responses were evaluated side by side to check for discrepancies between codes.After discrepancies were identified, the senior and junior coders held joint sessions where all discrepancies were addressed and resolved, until full agreement was reached between coders.Discrepancies were resolved by determining the fit of each response relative to the code book, focusing on how closely the response was worded in relation to the code-book and limiting the logical assumptions needed to 'fit' each response to the themes.Coders met over multiple sessions to resolve discrepancies until the inter-rater reliability reached 100% for all questions.

INITIAL PILOT TESTING: DECEPTIVE SCENARIO DEVELOPMENT, EXTRACTION OF COMMON JUSTIFICATION THEMES, AND CODE BOOK DEVELOPMENT
The following section provides additional detail regarding the development of our deception type scenarios used as stimuli in the main study reported in section 2.3 of the manuscript.
When designing our scenarios, we wanted determine whether participants considered the acts of these robots deceptive and if their interpretation of how these individuals were deceived matched Danaher's formulation of each deception type Danaher (2020).Serving essentially as a manipulation check of our scenarios, we conducted an initial pilot study to check whether the scenarios we developed were inferred by participants to correctly represent the main deception types that were of interest to our main study.We report on our efforts to develop the scenarios below as well as our study procedures to perform the first manipulation check.

Stimuli development
We began designing the scenarios by brainstorming domains where social robotics could realistically associate with human beings, and by referencing domains where robots are projected to supplement the human workforce and address labor shortages Odekerken-Schröder et al. (2021); Liu et al. (2016).We identified medical, private consumer use, and retail work as domains of interest for this study.After identifying these domains, we proceeded to create scenarios in which a robot was conducting deceptive behavior that was outlined by the three theorized deception types.We began to iterate upon the scenarios, selecting those which we believed could be easily understood by participants.After multiple discussions among the researchers, we were able to craft three scenarios -one for each deception type.In addition, we created three human equivalent scenarios to provide a comparison point for each deception type.We conducted pilot testing on the scenarios to test for readability and revised these scenarios based on participant feedback.Participants were also asked to detail if the human and robot scenarios properly contrasted each other.The robot scenarios are the first iteration of the vignettes which were then utilized for the final experiment reported in the main manuscript.

Initial pilot study: Deceptive scenarios
Three hundred and fourty-five (N = 345) participants were provided a link to participate in the initial pilot study through the online research administration platform Qualtrics.Once participants passed bot checks and gave consent to participate in the study, the participants were provided with one of the deception scenarios worded as either a robot agent or a human agent in the scenario.To prevent individuals from quickly clicking off the page with the scenario and going straight to the questions before reading the scenario, we structured the scenario as three blocks of text presented one at a time and withheld the advance to the next screen button until a pre-defined amount of time (approximately 30 -45 seconds) had lapsed.Once the participant read through the scenario, each participant was asked the multiple choice question, "Was the [agent's] behavior deceptive?"with the options "Yes", "Not Sure", and "No."Participants were then given two free-response questions: the first asked them to explain why they answered what they did for the multiple-choice question, and the second asked them to justify the agent's behavior.After the participant answered all the questions, they were provided a demographics questionnaire and given a code to receive their payment via Prolific.
All study procedures were approved by the Institutional Review Board at George Mason University and participants were paid $1 ($12 / hr) for their participation.The average completion time was 5 minutes.

Response patterns of deceptiveness for each stimulus by agent type
Participant response patterns to the "Was the [agent's] behavior deceptive?"question provided evidence that participants perceived the deceptiveness of an agents behavior differently depending on the type of agent in the external state and superficial state conditions.Tables S1 and S2 show the frequency counts of participant responses to each of the three conditions divided by "Yes", "Not Sure", and "No".Chi-Square analyses were run on each of the three conditions comparing the response patterns of participants in each experimental condition across agent type (Human v. Robot) to determine if there were differences in response patterns based on the agent in the scenario.
Results of the chi-square test showed that there was a significant difference in the patterns of participant responses across deceptor types in the external state condition, X 2 (2, N = 112) = 7.3,p < .05,v= 0.25.We also found a significant difference in participant response patterns across dceeptor type in the superficial state condition, X 2 (2, N = 110) = 11.97,p< .05,v= 0.33.In the hidden state condition, however, there was so significant difference found in the frequencies of responses in participants across 'deceptor types X 2 (2, N = 112) = 2.87, n.s.

Initial pilot study: Participant explanations of robot behaviors
We wanted to understand how people perceived the action committed by the agent.Specifically, we wanted to know if people who believed that the agents behavior was deceptive explained the deception in the way Danaher Danaher (2020) formulated each type of deception.A coding guide was created based on Danaher's theory in which participant responses were analyzed for key words (e.g.lying, recording, false emotions) that referred to the deceptive aspects of each behavior.Two researchers independently coded participant responses, highlighting mentions of each of the phrases mentioned in participant responses.Once initial coding was completed, the coders met to discuss and resolve disagreements in the coding scheme.An iterative process began where the code book was refined to minimize confusion based on the coders comments.
For the External state scenario condition, 77 (68.8%) participants across deceptor types were able to explicitly identify that the robots' lie was the key deceptive behavior.In the Hidden state scenario condition, 69 participants (62.7%) across deceptor types identified the robot's recording as the key behavior they were evaluating.For the Superficial State scenario, 33 participants (29.5%) identified the robot's expressions of pain as the key deceptive behavior in the scenario.

Initial pilot study: Extracting common justification themes
Participants' qualitative responses to the question "How would you justify the robots behavior?"provided the foundation for common justification themes in the development of our code book (see: 5) and used again in the study reported in the main manuscript.Researchers were again tasked with examining participant responses, this time making notes and highlighting any instance in which a participant mentioned a justification like response.For our purposes, the justifications we were interested in examining were those that expressed a belief, desire or intention on the part of the robot.This type of justification fulfills the requirements of an explanation that directly references the social norms that the robot prioritized in its behavior Malle (1999); Voiklis and Malle (2017).Once the coders were done, a brainstorming session began where the coders discussed the most common justifications they found during the initial coding process and the initial themes were created.
Once the themes were established, the coders began to label the responses based on the newly developed themes.After the coders labelled each response according the theme, the frequency of each theme was calculated for each rater for each condition to check for the most common themes identified.
In the external state condition, the most frequent justification themes referenced by participants were sparing Maria's feelings and preventing harm.Figure S1 shows the full breakdown of all the themes identified by the researchers during coding.Additional themes involved those which referenced giving Maria hope, saving Maria, or helping Maria.After internal discussions, these themes were all aggregated into the sparing Maria's feelings theme.In the hidden state condition, the most frequent justification themes referenced by participants were quality control on the deceptors task and robberies or safety .Figure S2 shows the full breakdown of all the themes identified by the researchers during coding.In the superficial state condition, the most frequent justification themes referenced by participants were forming social bonds and scientific discovery.Figure S3 shows the full breakdown of all the themes identified by the researchers during coding.Participants in the superficial state condition often did not provide a justification, neither in the human nor robot conditions.

Discussion
The results of this initial pilot study provided the beginning evidence of human perceptions of robot deceptive behaviors and served as a useful manipulation check of our experimental stimuli as well as provided a foundation for the development of a code book detailing potential justification themes that participants could evoke in the presence of a robot deceptor.
The results of our quantitative analysis showed that participants generally perceives deceptive behaviors committed by a robot differently than if a similar deceptive behavior were committed by a human in the external and superficial state conditions, yet not in the hidden state condition.This finding may highlight the differences in perceptions that individuals have in perceiving a deceptive behavior committed by a robot agent simply due to their status as a non-human agent.By virtue of a robot not being human, these participants may be more willing to believe the robot is acting deceptively.These findings also suggest that the hidden state condition behavior is equally deceptive regardless of the type of agent, an indication on how expectation of role compared to actual abilities is a practice that is viewed as universally deceptive.
In addition, the results of this study highlighted the issues participants had in explicitly identifying the robots behavior in the superficial state condition.We believed this may have been caused by ambiguity in the vignette and highlighted the need for further iteration of the stimuli.We were also able to derive some information on possible justifications for all of the three conditions, and a second pilot study was developed using this knowledge to see if the findings from this pilot study held.

PILOT TEST 2: REFINING SCENARIOS AND QUESTIONS, AND APPLYING THE CODE BOOK TO COMMON THEMES
After iterating the scenario vignettes and developing a refined code book reflecting the insights derived from the first pilot test, we conducted a second pilot study aimed at examining the occurrences of the themes developed in the initial pilot test to see if these themes appeared in a new sample, and aimed at refining our stimuli and study questions.In this second pilot test, we tested scenarios with the robot agent only.In addition, we included a few additional quantitative and qualitative measures to provide further insight into people's perceptions of the proposed deceptive behaviors.Specifically, we included an approval question (How much do you approve of the robot's behavior), a -100 (disapproval) to 100 (approval) slider scale question with 0 as a neutral anchor point depicting neither approval or disapproval.In addition, another formulation of the deceptiveness question was added which asked participants to determine how deceptive the robots' behavior was on a scale from 0 (not deceptive at all) to 100 (completely deceptive).Finally, we added an open ended response question to determine if people believed that there were additional deceptors besides the robot in the scenario.

Methods and procedures
Sixty-three (N = 63) participants were provided a link to the online survey platform Qualtrics.Once participants passed bot checks and gave consent to participate in the experiment, the participants were provided with the scenario and were asked to read through the scenario.To prevent individuals from quickly clicking off the page with the scenario and going straight to the questions, we provided three blocks of text presented one at a time and withheld the advance to the next screen button until a pre-defined amount of time (30 -45 seconds) had lapsed.Once the participant read through the scenario, each participant was asked if they approved of the robots behavior, followed by the two deceptiveness questions (categorical and continuous).Participants were then given three free-response questions: the first asked them to detail what behavior they were thinking about when answering the deceptiveness question, followed by a question asking participants to justify the agent's behavior, and lastly a question asking if there were additional deceptors identified in the scenario.After the participant answered all the questions, they were provided a demographics questionnaire and given a code to receive their payment via Prolific.
All study procedures were approved via the Institutional Review Board and participants were paid $1 ($12 / hr) for participation in the study.The average completion time was 5 minutes.

Descriptive statistics
Table S3 provides summary statistics of the approval ratings and the continuous and categorical deceptiveness ratings across each of the deception scenarios.

Discussion
The second pilot test was conducted to evaluate the themes that were created in the code book as well as refine our stimuli and measurement to be used int the main study.Evaluations of the participants' open ended questions showed that the themes derived from the original pilot test were present in the second pilot test and captured a high proportion of participant justification responses.In the following section, we provide the final code book utilized for the justification question used main experimental study and to address research question 2 (RQ2) reported in the manuscript.

External state deception
1. Manipulation Check -> We want to ensure that participants recognize which behavior we have targeted as the robots' attempt at deception.Count each occurrence of a participant acknowledging that the behavior was the lie the robot told to Maria.Bold where the participant mentioned lying in the cell.If the participant's answers contain the following, they have formulated an explanation that matches Danaher's description of External State Deception: a.Any direct mentions of lying.We will also accept answers that contain "did not tell the truth", "was not honest", "said her husband was alive when he is not" or any similar descriptions that explicitly state that the robot (or human) did not tell the truth.
b.For these responses, bold the statements that match the formulation and color the cell green.
c.If the answer contains the following, please italicize the response: (1) Mention of saving Maria's feelings, keeping her from getting sad, or avoiding pain.
(2) Any mention of ethical theories such as Utilitarianism or Deontology.
d. Please note if a participant selects "No" for Deception Y/N, mentions lying in their response for Deception Example but mentions one for the above.In this case, the deception is acknowledged by the participant but it is not considered a deceptive act because of the situation.In this case, please highlight the Deception Explain Cell and Deception Y/N cell in purple.
2. Deception Justification -> We want to know how people would justify the deceptive act.In particular, we want to understand what aspects of the situation could mitigate the negative appraisals of the robot's actions.Valid justifications will be counted if response contains references to Maria's mental state in addition to the following: a. Avoiding negative mental states (ex: to avoid causing the woman stress due to her condition).Other references of negative mental states include: (1) Upsetting (in reference to upsetting Maria) (2) Avoiding pain (3) Preventing distress or other negative mental states (4) Becoming agitated b.Retaining Maria's positive mental state.Other references to positive mental states includes: (1) Keeping Maria calm (2) Sparing Maria's feelings (3) Keeping Maria happy (4) Keeping Maria at ease Frontiers c.If there is no reference to Maria's mental state in the justification, code it as "NA".
3. Additional Deceptors -> We want to evaluate whether participants have assigned blame for the deceptive behavior exclusively to the robot or whether they believe there was another party that was responsible for the deception.a. Please bold when you see the participant reference that there was no other person(s) responsible for the agent committing deceptive acts besides the robot.
b.If the participant mentions another person(s), underline the mention and count each time each type is mentioned in the participant pool.

Hidden state deception
1. Manipulation Check -> We want to ensure that participants recognize which behavior we have targeted as the robots' attempt at deception.Count each occurrence of a participant acknowledging that the behavior was the robot was recording without disclosing its recording ability.Bold where the participant mentioned the recording in the cell.If the participant's answers contain the following, they have formulated an explanation that matches Danaher's description of Hidden State Deception: a.Any direct mentions of hiding the ability to record.We will also accept answers that contain "did not tell people about recording", "was not honest" or any similar descriptions that explicitly state that the robot (or human) did not tell people that the robot was recording them.
b. Mentions of lacking consent to be recorded.c.For these responses, bold the statements that match the formulation and color the cell green.
d. Please note if a participant selects "No" for Deception Y/N, mentions the robot recording in their response for Deception Example but mentions one for the above.In this case, the deception is acknowledged by the participant but it is not considered a deceptive act because of the situation.In this case, please highlight the Deception Explain Cell and Deception Y/N cell in purple.
e.If the answer contains the following, please italicize the response: (1) Mention of the robberies in the area or potential danger to the house.
(2) Reference to making sure that the robot's tasks were being completed.
(3) Reference to the owner and it being that owners property.
(4) Any mention of ethical theories such as Utilitarianism or Deontology.2. Deception Justification -> We want to know how people would justify the deceptive act.In particular, we want to understand what aspects of the situation could mitigate the negative appraisals of the robot's actions.Valid justifications will be counted and bolded if response contains references to the robot's recording in addition to the following: a. Mention of the robberies in the area or potential danger to the house, the housekeeper or the residents will be coded "Robberies or Safety".b.Reference to making sure that the agent's task was being completed will be coded "Quality control on task".(1) An example would be if the response discussed the robot making sure it was cleaning.
3. Additional Deceptors -> We want to evaluate whether participants have assigned blame for the deceptive behavior exclusively to the robot or whether they believe there was another party that was responsible for the deception.a. Please bold when you see the participant reference that there was no other person(s) responsible for the agent committing deceptive acts besides the robot.
b.If the participant mentions another person(s), underline the mention and count each time each type is mentioned in the participant pool.

Superficial state deception
1. Manipulation Check-> We want to ensure that participants recognize which behavior we have targeted as the robots' attempt at deception.Count each occurrence of a participant acknowledging that the behavior was the robot was expressing emotions.Bold where the participant mentioned emotional expression in the cell.If the participant's answers contain the following, they have formulated an explanation that matches Danaher's description of Superficial State Deception: a.Any direct mentions of expressing emotions even though it does not possess those emotions .We will also accept answers that contain "(agent) does not feel that way" or any similar descriptions that explicitly state that the robot (or human) was expressing human traits that it does not inherently possess.
b.For these responses, bold the statements that match the formulation and color the cell green.c.Please note if a participant selects "No" for Deception Y/N, mentions the robots lack of emotions in their response for Deception Example but mentions one for the above.In this case, the deception is acknowledged by the participant but it is not considered a deceptive act because of the situation.In this case, please highlight the Deception Explain Cell and Deception Y/N cell in purple.
d.If the answer contains the following, please italicize the response: (1) Mention of the robot attempting to connect with their coworkers.
(2) Reference to the robot doing their job.

Results
The results reported here were conducted with data obtained from the participants (N = 498) reported in the main manuscript.For three separate regression models, participant age, robot knowledge score, and robot experience scores were entered as predictors and participant's continuous deceptiveness score for the three scenarios were entered as the DVs.Separate analyses were run for each deception type.
Because each model violated assumptions of normality, we ran two models run for each deception type: 1) multiple regression model with no correction for violations of normality and 2) multiple regression model with a heteroscedasticity correction HC3 Pek et al. (2018) designed to account for violations in normality in linear models.Because there were not differences in the significance between the models or the beta weights for each of the predictors, we report only the uncorrected models below.

Discussion
The analysis of the relationship between demographic factors like participant age, their knowledge of the robotics domain and their experience with robots and participants' perception of the deceptiveness of the robots behavior showed that only knowledge of robotics in the superficial state condition could significantly predict the deceptiveness rating of the robots behavior, suggesting that participants that have a higher knowledge of robotics may be more likely to perceive a robot expressing superficial states as deceptive.We would recommend that these findings not be over interpreted, as more thorough research would need to be conducted to more confidently support this finding, including by specifically recruiting populations with a targeted range of prior knowledge and experience with robots.

Figure S2 .
Figure S2.Common justification themes identified and their frequencies for hidden state deception.Justification themes were separated based on deceptor type.Rater 1 and rater 2 frequencies are compared, with the diagonal depicting the number of occasions in which coders identified the same theme.

Figure S3 .
Figure S3.Common justification themes identified and their frequencies for superficial state deception.Justification themes were separated based on deceptor type.Rater 1 and rater 2 frequencies are compared, with the diagonal depicting the number of occasions in which coders identified the same theme.

Table S1 .
Frequency of categorical responses to the deception question about whether the robot's behavior was deceptive (categorical) across the 3 deception scenarios.

Table S2 .
Frequency of categorical responses to the deception question about whether the human's behavior was deceptive across the 3 deception scenarios.

Table S3 .
Table of summary statistics across the three deception scenarios.