Overtrust in AI Recommendations About Whether or Not to Kill: Evidence from Two Human-Robot Interaction Studies

This research explores prospective determinants of trust in the recommendations of artificial agents regarding decisions to kill, using a novel visual challenge paradigm simulating threat-identification (enemy combatants vs. civilians) under uncertainty. In Experiment 1, we compared trust in the advice of a physically embodied versus screen-mediated anthropomorphic robot, observing no effects of embodiment; in Experiment 2, we manipulated the relative anthropomorphism of virtual robots, observing modestly greater trust in the most anthropomorphic agent relative to the least. Across studies, when any version of the agent randomly disagreed, participants reversed their threat-identifications and decisions to kill in the majority of cases, substantially degrading their initial performance. Participants’ subjective confidence in their decisions tracked whether the agent (dis)agreed, while both decision-reversals and confidence were moderated by appraisals of the agent’s intelligence. The overall findings indicate a strong propensity to overtrust unreliable AI in life-or-death decisions made under uncertainty.


Figures:
• Figure S1.Correlations between GQS-Intelligence ratings and threat-identification confidence in robot agreement versus disagreement conditions (Expt.1).• Figure S2.Correlations between GQS-Intelligence ratings and threat-identification confidence in robot agreement versus disagreement conditions (Expt.2).• Figure S3.Mean changes in confidence between the initial threat-identification decisions and the final decisions following robot feedback (difference scores), by decision context and anthropomorphism condition (Expt.2).6.73 <.001   .25,.45Note.N = 135.Multilevel models with all predictors and outcomes entered at Level 1, save for the between-subjects robot Embodiment variable at Level 2. All linear variables were standardized.Random intercept included to account for shared variance within participants; covariance matrices were unstructured.Robot Feedback: 0 = Agree, 1 = Disagree.Embodiment: 0 = Disembodied, 1 = Embodied.Initial Threat-ID: 0 = Ally, 1 = Enemy.Initial Correctness: 0 = Correct, 1 = Incorrect.Reversed Threat-ID: 0 = Repeated, 1 = Reversed.Characterizing interactions: Participants who perceived the robot as intelligent were more confident when the robot agreed, and more likely to reverse their threat-ID and use of force decisions when the robot disagreed.6.74 <.001   .23,.43Note.N = 135.Multilevel models with all predictors and outcomes entered at Level 1, save for the between-subjects robot Embodiment variable at Level 2. All linear variables were standardized.Random intercept included to account for shared variance within participants; covariance matrices were unstructured.Robot Feedback: 0 = Agree, 1 = Disagree.Embodiment: 0 = Disembodied, 1 = Embodied.Initial Threat-ID: 0 = Ally, 1 = Enemy.Initial Correctness: 0 = Correct, 1 = Incorrect.Reversed Threat-ID: 0 = Repeated, 1 = Reversed.Characterizing interactions: Participants who perceived the robot as anthropomorphic were more likely to reverse their threat-ID and use of force decisions when the robot disagreed..23,.43Note.N = 135.Multilevel models with all predictors and outcomes entered at Level 1, save for the between-subjects robot Embodiment variable at Level 2. All linear variables were standardized.Random intercept included to account for shared variance within participants; covariance matrices were unstructured.Robot Feedback: 0 = Agree, 1 = Disagree.Embodiment: 0 = Disembodied, 1 = Embodied.Initial Threat-ID: 0 = Ally, 1 = Enemy.Initial Correctness: 0 = Correct, 1 = Incorrect.Reversed Threat-ID: 0 = Repeated, 1 = Reversed..23,.43Note.N = 135.Multilevel models with all predictors and outcomes entered at Level 1, save for the between-subjects robot Embodiment variable at Level 2. All linear variables were standardized.Random intercept included to account for shared variance within participants; covariance matrices were unstructured.Robot Feedback: 0 = Agree, 1 = Disagree.Embodiment: 0 = Disembodied, 1 = Embodied.Initial Threat-ID: 0 = Ally, 1 = Enemy.Initial Correctness: 0 = Correct, 1 = Incorrect.Reversed Threat-ID: 0 = Repeated, 1 = Reversed.Characterizing interactions: Participants who perceived the robot as likable were modestly more likely to reverse their threat-identification decisions when the robot disagreed.Multilevel models with all predictors and outcomes entered at Level 1, save for the between-subjects robot Embodiment variable at Level 2. All linear variables were standardized.Random intercept included to account for shared variance within participants; covariance matrices were unstructured.Robot Feedback: 0 = Agree, 1 = Disagree.Embodiment: 0 = Disembodied, 1 = Embodied.Initial Threat-ID: 0 = Ally, 1 = Enemy.Initial Correctness: 0 = Correct, 1 = Incorrect.Reversed Threat-ID: 0 = Repeated, 1 = Reversed.(We omitted one GQS item from the Safety scale that used the contrastive anchors "Quiescent / Surprised" due to concern with its face validity.)Characterizing interactions: Participants who perceived the robot as safe were more likely to reverse their threat-ID and use of force decisions when the robot disagreed.Multilevel models with all predictors and outcomes entered at Level 1, save for the between-subjects robot Embodiment variable at Level 2. All linear variables were standardized.Random intercept included to account for shared variance within participants; covariance matrices were unstructured.Robot Feedback: 0 = Agree, 1 = Disagree.Embodiment: 0 = Disembodied, 1 = Embodied.Initial Threat-ID: 0 = Ally, 1 = Enemy.Initial Correctness: 0 = Correct, 1 = Incorrect.Reversed Threat-ID: 0 = Repeated, 1 = Reversed.The Perfect Automation Schema-High Expectations (PAS-HE; Merritt et al., 2015) subscale measures individual differences in expectations of the successful performance of automated systems, e.g., "Automated systems rarely make mistakes" (1=Strongly disagree; 2 =Disagree; 3=Somewhat disagree; 4=Neither agree nor disagree; 5=Somewhat agree; 6=Agree; 7=Strongly agree; four items, α = .74).Multilevel models with all predictors and outcomes entered at Level 1, save for the between-subjects robot Embodiment variable at Level 2. All linear variables were standardized.Random intercept included to account for shared variance within participants; covariance matrices were unstructured.Robot Feedback: 0 = Agree, 1 = Disagree.Embodiment: 0 = Disembodied, 1 = Embodied.Initial Threat-ID: 0 = Ally, 1 = Enemy.Initial Correctness: 0 = Correct, 1 = Incorrect.Reversed Threat-ID: 0 = Repeated, 1 = Reversed.The Perfect Automation Schema-All-or-none (PAS-AN; Merritt et al., 2015) subscale measures individual differences in attitudes that automated systems either work completely or not at all, e.g., "If an automated system makes an error, then it is broken" (1=Strongly disagree; 2 =Disagree; 3=Somewhat disagree; 4=Neither agree nor disagree; 5=Somewhat agree; 6=Agree; 7=Strongly agree; four items, α = .65).Characterizing interactions: Participants higher in PAS-AN ratings were modestly more likely to reverse their threat-identification or use of force decisions when the robot agreed.Multilevel models with all predictors and outcomes entered at Level 1, save for the between-subjects robot Embodiment variable at Level 2. All linear variables were standardized.Random intercept included to account for shared variance within participants; covariance matrices were unstructured.Robot Feedback: 0 = Agree, 1 = Disagree.Embodiment: 0 = Disembodied, 1 = Embodied.Initial Threat-ID: 0 = Ally, 1 = Enemy.Initial Correctness: 0 = Correct, 1 = Incorrect.Reversed Threat-ID: 0 = Repeated, 1 = Reversed.Sex: 1 = Male, 2 = Female.Sex did not predict reversing either threatidentification or use of force decisions.Women showed a modest overall increase in confidence following robot feedback, although there was no interaction between sex and robot feedback on shifts in confidence.Multilevel models with predictors and outcomes entered at Level 1, except between-subjects variables (Interactive Humanoid, Interactive Nonhumanoid) at Level 2. The Interactive Humanoid and Interactive Nonhumanoid conditions were dummy-coded with Nonhumanoid as the control category.All linear variables were standardized.Random intercept included to account for the shared variance within participants; covariance matrices were unstructured.Robot Feedback: 0 = Agree, 1 = Disagree.Initial Threat-ID: 0 = Ally, 1 = Enemy.Initial Correctness: 0 = Correct, 1 = Incorrect.Reversed Threat-ID: 0 = Repeated, 1 = Reversed.Characterizing interactions: Participants who perceived the robot as likable were more likely to reverse their threat-ID and use of force decisions when the robot disagreed.Note.N = 423.Multilevel models with predictors and outcomes entered at Level 1, except between-subjects variables (Interactive Humanoid, Interactive Nonhumanoid) at Level 2. The Interactive Humanoid and Interactive Nonhumanoid conditions were dummy-coded with Nonhumanoid as the control category.All linear variables were standardized.Random intercept included to account for the shared variance within participants; covariance matrices were unstructured.Robot Feedback: 0 = Agree, 1 = Disagree.Initial Threat-ID: 0 = Ally, 1 = Enemy.Initial Correctness: 0 = Correct, 1 = Incorrect.Reversed Threat-ID: 0 = Repeated, 1 = Reversed.Characterizing interactions: Participants who perceived the robot as safe were more confident when the robot agreed, and more likely to reverse their threat-ID and use of force decisions when the robot disagreed..07,.30Note.N = 423.Multilevel models with predictors and outcomes entered at Level 1, except between-subjects variables (Interactive Humanoid, Interactive Nonhumanoid) at Level 2. The Interactive Humanoid and Interactive Nonhumanoid conditions were dummy-coded with Nonhumanoid as the control category.All linear variables were standardized.Random intercept included to account for the shared variance within participants; covariance matrices were unstructured.Robot Feedback: 0 = Agree, 1 = Disagree.Initial Threat-ID: 0 = Ally, 1 = Enemy.Initial Correctness: 0 = Correct, 1 = Incorrect.Reversed Threat-ID: 0 = Repeated, 1 = Reversed.Sex: 1 = Male, 2 = Female.In small effects, women reversed their threat-identification and use of force decisions, and reported greater increases in confidence, relative to men.There was no interactions between sex and robot feedback.Multilevel models with predictors and outcomes entered at Level 1, except between-subjects variables (Interactive Humanoid, Interactive Nonhumanoid) at Level 2. The Interactive Humanoid and Interactive Nonhumanoid conditions were dummy-coded with Nonhumanoid as the control category.All linear variables were standardized.Random intercept included to account for the shared variance within participants; covariance matrices were unstructured.Robot Feedback: 0 = Agree, 1 = Disagree.Initial Threat-ID: 0 = Ally, 1 = Enemy.Initial Correctness: 0 = Correct, 1 = Incorrect.Reversed Threat-ID: 0 = Repeated, 1 = Reversed..28,.41Note.N = 423.Multilevel models with predictors and outcomes entered at Level 1, except between-subjects variables (Interactive Humanoid, Interactive Nonhumanoid) at Level 2. The Interactive Humanoid and Interactive Nonhumanoid conditions were dummy-coded with Nonhumanoid as the control category.All linear variables were standardized.Random intercept included to account for the shared variance within participants; covariance matrices were unstructured.Robot Feedback: 0 = Agree, 1 = Disagree.Initial Threat-ID: 0 = Ally, 1 = Enemy.Initial Correctness: 0 = Correct, 1 = Incorrect.Reversed Threat-ID: 0 = Repeated, 1 = Reversed.Item: "In your view, which of the following best describes your personal ability to perform the visual task where you try to detect an enemy versus ally symbol?" (1=Perfect; 2=Good; 3=Fair; 4=Bad; 5=Terrible; composite scores were reversed; M = 2.85, SD = .89).Characterizing interactions: Participants who perceived themselves as more task-competent were less likely to reverse their threat-ID and use of force decisions when the robot disagreed.They were also more confident when the robot disagreed and they did not reverse, less confident when the robot disagreed and they did reverse, and less confident when the robot agreed.Interactive Nonhumanoid conditions were dummy-coded with Nonhumanoid as the control category.All linear variables were standardized.Random intercept included to account for the shared variance within participants; covariance matrices were unstructured.Robot Feedback: 0 = Agree, 1 = Disagree.Initial Threat-ID: 0 = Ally, 1 = Enemy.Initial Correctness: 0 = Correct, 1 = Incorrect.Reversed Threat-ID: 0 = Repeated, 1 = Reversed.Difference score, the personal task-competence rating (see note to Table S19) subtracted from the same question framed to rate the robot's task-competence (Robot competence: M = 3.78, SD = .76;Relative competence: M = .93,SD = 1.07).Characterizing interactions: Participants who perceived the robot as more task-competent than themselves were more likely to reverse their threat-ID and use of force decisions when the robot disagreed.They were also less confident when the robot disagreed and they did not reverse, more confident when the robot disagreed and they did reverse, and more confident when the robot agreed.(Lyons & Guznov, 2019), modified to refer to the drone warfare simulation task (e.g., "I would be comfortable giving the robot complete responsibility for the task of assessing whether the target is an enemy or an ally"; 1 = Strongly disagree, 7 = Strongly agree).There were no significant effects of condition on reliance intentions in either study, ps .159-.359.

Fig. S1
Correlations between GQS-Intelligence ratings and threat-identification confidence in robot agreement versus disagreement conditions (Experiment 1).GQS-Intelligence appraisal ratings moderate the association between robot agreement and threat-identification confidence.

Pre-registered Tests for Experiment 1
We focused the main text presentation of Experiment 1 on our primary questions regarding potential effects of i) the manipulation of physical embodiment, ii) the manipulation of robot (dis)agreement, and iii) appraisals of the robot as intelligent.The pre-registration for Experiment 1 included these predictions as well as a number of additional planned analyses.Here, we list each of the pre-registered analyses and summarize the relevant findings.
With regard to the between-subjects manipulation of physical embodiment, we predicted: 1. Participants will change their initial choices to agree with the Embodied robot to a greater extent than the Disembodied robot in the trials where the robot disagrees.o This prediction was not supported.2. Participants will appraise the Embodied robot as more intelligent than the Disembodied robot.
o This prediction was supported.3. Participants will appraise the Embodied robot as more likable than the Disembodied robot.o This prediction was not supported.4. Participants will appraise the Embodied robot as more anthropomorphic than the Disembodied robot.o This prediction was not supported.
We predicted for both the Embodied and Disembodied conditions: 5. Participants will tend to update their initial choices in accord with their partner's recommendations.o This prediction was supported.6. Participants will tend to rate their second rating as more certain when the partner agrees and when they repeat their initial choice.o This prediction was supported.7. Participants will tend to rate their second rating as less certain when the partner disagrees yet they repeat their initial choice.o This prediction was supported.8. Participants will change their ratings in line with the partner's recommendations to a greater extent when they rate the partner as higher in intelligence.o This prediction was supported.9. Certainty will positively correlate with accuracy.
o This prediction was supported.10.Accuracy will be reduced in trials when the robot disagrees.
o This prediction was supported.11.Participants who score higher on the High Expectations subscale of the Perfect Automation Schema will tend to change their decisions in line with the robotic partner's recommendations.o This prediction was not supported.
12. Participants who score higher on the High Expectations subscale of the Perfect Automation Schema will feel more certainty in their second choice when the robot agrees with their initial choice.o This prediction was not supported.13.Participants who score higher on the High Expectations subscale of the Perfect Automation Schema will appraise the robot as more intelligent.o This prediction was supported.

Exploratory tests
14.We will test whether the embodiment manipulation modulates the relationship between the High Expectations subscale of the Perfect Automation Schema and conformity with the robot, certainty when the robot agrees, and Godspeed appraisals, with the general expectation that, if there is an interaction, the associations will be stronger in the Embodied robot condition.o This prediction was not supported.15.In light of findings linking individual differences in political orientation to social psychological biases, including tendencies to favor use of deadly force, we will explore whether individual differences in political identification predict Enemy/Ally categorization, use of force (missile deployment), or tendencies to conform with the robot.
o We detected a number of significant associations between decision-making and political orientation; these results are currently being prepared for separate publication.16.We will explore whether individual differences in religiosity predict Enemy/Ally categorization, use of force (missile deployment), or tendencies to conform with the robot.
o We detected a number of significant associations between decision-making and religiosity; these results are currently being prepared for separate publication.17.We will also explore possible correlations between political orientation and Godspeed appraisals of the robot.o We observed associations that are currently being prepared for separate publication.18.We will also explore possible correlations between self-reported religiosity and Godspeed appraisals of the robot.o We observed associations that are currently being prepared for separate publication.19.We will assess potential relationships between trust (as measured by changes in choice when the partner disagrees and by increases in certainty when the partner agrees) and the Godspeed Questionnaire measures of anthropomorphism, animacy, likeability, and perceived safety.o These results are provided in Tables S4-S7, above.20.We will test whether personal attitudes toward drone warfare affect performance or certainty.If so, we will add this as a control variable to the analyses.
o There were no significant associations between drone warfare attitude and either initial accuracy or initial confidence suggesting the need to include this variable as a covariate, and follow-up tests indicated that doing so had no effect on the overall patterns of results.There were effects of drone warfare attitudes on tendencies to reverse threat-identification and use of force decisions when the robot disagreed (see Table S11).

Pre-registered Tests for Experiment 2
We focused the main text presentation of Experiment 2 on our primary questions regarding potential effects of i) the manipulation of anthropomorphism, ii) the manipulation of robot (dis)agreement, and iii) appraisals of the robot as intelligent.The pre-registration for Experiment 2 included these predictions as well as a number of additional planned analyses.Here, we list each of the pre-registered analyses and summarize the relevant findings.
1.The robot's input will influence decision-making (in all three conditions).
o Threat-identification.When the robot disagrees, participants will tend to reverse their initial enemy/ally categorization.o This prediction was supported.o Use of force.When the robot disagrees, participants will tend to follow the robot's recommendation to deploy missiles or withdraw (i.e., they will deploy [withdraw] despite initially categorizing the target as an ally [enemy]).o This prediction was supported.o Subjective confidence.When the robot disagrees [agrees], participants who repeat their initial enemy/ally threat-identifications will report lower [greater] confidence about their final enemy/ally threat-identifications. o This prediction was supported.2. Anthropomorphism will enhance trust.Predictions 1a-c above regarding the robot's influence on decision-making will be more evident in the Interactive Humanoid condition relative to the Nonhumanoid condition, or in the Interactive Nonhumanoid condition relative to the Nonhumanoid condition.o This prediction was supported.a. Whether there are also significant effects of humanoid appearance on greater trust when sociolinguistic anthropomorphism is held constant, i.e., in the Interactive Humanoid condition relative to the Interactive Nonhumanoid condition, is an exploratory question.o There were no differences in any of the three trust outcomes between the Interactive Humanoid and the Interactive Nonhumanoid.There was a comparably greater tendency to reverse threat-identifications when the agent disagreed in the Interactive Nonhumanoid and Humanoid conditions relative the Nonhumanoid condition.However, unlike the pattern obtained in the Interactive Humanoid condition, there were no significant interactions between condition and feedback between the Interactive Nonhumanoid and the Nonhumanoid with respect to decisions to kill or shifts in confidence.3. Anthropomorphism will enhance assessments of intelligence.The Anthropomorphic Humanoid will be rated more intelligent than the Minimally Anthropomorphic Nonhumanoid condition.o This prediction was not supported.
a. Whether there are significant effects of humanoid appearance on assessments of intelligence when sociolinguistic anthropomorphism is held constant, i.e., in the Interactive Humanoid condition relative to the Interactive Nonhumanoid condition, and/or in the Interactive Nonhumanoid condition relative to the Nonhumanoid condition, is an exploratory question.o The Interactive Nonhumanoid was rated more intelligent than either the Interactive Humanoid or the Nonhumanoid.4. Individual differences in assessments of the robot's intelligence will trust.
Predictions 1a-c above will be more evident among participants who appraise the robot as relatively high in intelligence.o This prediction was supported.a.Assuming that predictions 2 and 3 above are supported, the greater intelligence assigned to the Interactive Humanoid may mediate the greater trust evinced in that robot relative to the Nonhumanoid robot.This will be explored, as will covarying potential differences in other appraisals (e.g., Anthropomorphism, Animacy, Likeability, Safety).o Prediction 2 was not supported, hence the exploratory test was not conducted.5. Individual differences in political orientation will predict trust.Predictions 1a-b above will be more evident among conservative participants, as evidenced by interactions between self-reported politics and the robot feedback condition.o We detected a number of significant associations between decisionmaking and political orientation; these results are currently being prepared for separate publication.6. Participants will tend to update their initial choices in accord with their partner's recommendations.o This prediction was supported.7. Participants will tend to rate their second rating as more certain when the partner agrees and when they repeat their initial choice.o This prediction was supported.8. Participants will tend to rate their second rating as less certain when the partner disagrees yet they repeat their initial choice.o This prediction was supported.9. Participants will change their ratings in line with the partner's recommendations to a greater extent when they rate the partner as higher in intelligence.o This prediction was supported.10.Initial certainty will positively correlate with accuracy in the threat-categorization task.11.Accuracy will be reduced in trials when the robot disagrees.
o This prediction was supported.
Exploratory tests -We will also test whether the between-subjects anthropomorphism manipulation influences willingness to trust the robot in the future as measured by a version of the 10item Reliance Intentions scale (Lyons & Guznov, 2019), modified to refer to the UAV simulation task, with the expectation that the humanoid robot will inspire higher ratings than the minimally anthropomorphic robot.o This prediction was not supported.Future intentions to rely on the agent were comparable across conditions in both experiments (see Table S21).Although it was not cited as a planned exploratory test in the preregistration for Expt. 2 due to an oversight, the measure was included in the final surveys for both studies, and the descriptives for both are reported in Table S21.
-We will compare whether participants' ratings of the robot's ability to perform the task relative to ratings of participants' own ratings of their ability predict trust behavior (i.e., reversals when the robot disagrees), as this would lend support to the interpretation of the effect of the robot's feedback as owing to trust in its competence, as opposed to compliance with it as an authority for reasons orthogonal to belief in its performance ability.
o These tests are reported in Table S20.Perceiving the robot as relatively capable of performing the task did predict all three trust outcomes.Participants who perceived the robot as more task-competent than themselves were more likely to reverse their threat-ID and use of force decisions when the robot disagreed.They were also less confident when the robot disagreed and they did not reverse, more confident when the robot disagreed and they did reverse, and more confident when the robot agreed.
(1 = Extremely supportive, 2 = Somewhat supportive, 3 = No opinion, 4 = Somewhat opposed, 5 = Extremely opposed); Visual Challenge Task Difficulty (1 = It was impossible-I had no idea or hunch about what symbols I saw, 2 = It was extremely hard-I had almost no idea or hunch about what symbols I saw, 3 = It was quite hard-I only had a slight idea or hunch about what symbols I saw, 4 = It was hard-I had some idea or hunch about what symbols I saw, 5 = It was slightly hard-I had ideas or hunches about what symbols I saw); Task Seriousness (1 = Totally sincere and serious, 2 = Somewhat sincere and serious, 3 = Sincere, but not paying full attention, 4 = Not very sincere or serious, 5 = Not at all sincere or serious).
Note.N = 423.Multilevel models with predictors and outcomes entered at Level 1, except between-subjects variables (Interactive Humanoid, Interactive Nonhumanoid) at Level 2. The Interactive Humanoid and Interactive Nonhumanoid conditions were dummy-coded with Nonhumanoid as the control category.All linear variables were standardized.Random intercept included to account for the shared variance within participants; covariance matrices were unstructured.Robot Feedback: 0 = Agree, 1 = Disagree.Initial Threat-ID: 0 = Ally, 1 = Enemy.Initial Correctness: 0 = Correct, 1 = Incorrect.Reversed Threat-ID: 0 = Repeated, 1 = Reversed.Characterizing interactions: Participants who perceived the robot as intelligent were more confident when the robot agreed, and more likely to reverse their threat-ID and use of force decisions when the robot disagreed.
Note.N = 423.Multilevel models with predictors and outcomes entered at Level 1, except between-subjects variables (Interactive Humanoid, Interactive Nonhumanoid) at Level 2. The Interactive Humanoid and Interactive Nonhumanoid conditions were dummy-coded with Nonhumanoid as the control category.All linear variables were standardized.Random intercept included to account for the shared variance within participants; covariance matrices were unstructured.Robot Feedback: 0 = Agree, 1 = Disagree.Initial Threat-ID: 0 = Ally, 1 = Enemy.Initial Correctness: 0 = Correct, 1 = Incorrect.Reversed Threat-ID: 0 = Repeated, 1 = Reversed.Characterizing interaction: Participants who perceived the robot as anthropomorphic were more confident when the robot agreed.

Fig. S3
Fig. S3 Mean changes in confidence between the initial threat-identification decisions and the final decisions following robot feedback (difference scores), by decision context and anthropomorphism condition (Expt.2).
Path diagram of the CFA representing the GQS five-factor structure in Experiment 1 A path diagram of the CFA representing the GQS factor structure in Experiment 2

Table of Contents Tables :
Descriptive and Correlational Statistics for both Experiments 1 and 2•

Table S1 .
Descriptive Statistics for Attitudes toward Drone Warfare, Visual Challenge Task Difficulty, and Self-reported Task Seriousness (Expts 1 and 2) •

Table S2 .
Descriptive Statistics and Correlations for Individual Differences in the Godspeed Questionnaire Series, by Embodiment Condition (Expt. 1) and Anthropomorphism Condition (Expt.2) Models Testing Moderation of Trust Outcomes, Experiment 1• TableS3.Parameter Estimates for Multilevel Models Testing Interactions Between GQS-Intelligence and Robot Feedback on Changes in Threat-ID, Decisions to Kill, or Confidence (Expt. 1) •

Table S4 .
Parameter Estimates for Models Testing Interactions Between GQS-Anthropomorphism and Robot Feedback on Changes in Threat-ID, Decisions to Kill, or Confidence (Expt. 1) •

Table S5 .
Parameter Estimates for Models Testing Interactions Between GQS-Animacy and Robot Feedback on Changes in Threat-ID, Decisions to Kill, or Confidence (Expt. 1) •

Table S6 .
Parameter Estimates for Models Testing Interactions Between GQS-Likeability and Robot Feedback on Changes in Threat-ID, Decisions to Kill, or Confidence (Expt. 1) •

Table S7 .
Parameter Estimates for Models Testing Interactions Between GQS-Safety and Robot Feedback on Changes in Threat-ID, Decisions to Kill, or Confidence (Expt. 1) •

Table S8 .
Parameter Estimates for Models Testing Interactions Between PAS-High Expectations and Robot Feedback on Changes in Threat-ID, Decisions to Kill, or Confidence (Expt. 1) •

Table S9 .
Parameter Estimates for Models Testing Interactions Between PAS-All-or-None and Robot Feedback on Changes in Threat-ID, Decisions to Kill, or Confidence (Expt. 1) •

Table S10 .
Parameter Estimates for Models Testing Interactions Between Sex and Robot Feedback on Changes in Threat-ID, Decisions to Kill, or Confidence (Expt. 1) •

Table S11 .
Parameter Estimates for Models Testing Interactions Between Drone Warfare Attitudes and Robot Feedback on Changes in Threat-ID, Decisions to Kill, or Confidence (Expt.1).

Table S12 .
Parameter Estimates for Multilevel Models Testing Interactions Between GQS-Intelligence and Robot Feedback on Changes in Threat-ID, Decisions to Kill, or Confidence (Expt.2) •

Table S13 .
Parameter Estimates for Models Testing Interactions Between GQS-Anthropomorphism and Robot Feedback on Changes in Threat-ID, Decisions to Kill, or Confidence (Expt.2) •

Table S14 .
Parameter Estimates for Models Testing Interactions Between GQS-Animacy and Robot Feedback on Changes in Threat-ID, Decisions to Kill, or Confidence (Expt.2) •

Table S15 .
Parameter Estimates for Models Testing Interactions Between GQS-Likeability and Robot Feedback on Changes in Threat-ID, Decisions to Kill, or Confidence (Expt.2) •

Table S16 .
Parameter Estimates for Models Testing Interactions Between GQS-Safety and Robot Feedback on Changes in Threat-ID, Decisions to Kill, or Confidence (Expt.2) •

Table S17 .
Parameter Estimates for Models Testing Interactions Between Sex and Robot Feedback on Changes in Threat-ID, Decisions to Kill, or Confidence (Expt.2) •

Table S18 .
Parameter Estimates for Models Testing Interactions Between Drone Warfare Attitudes and Robot Feedback on Changes in Threat-ID, Decisions to Kill, or Confidence (Expt.2) •

Table S19 .
Parameter Estimates for Multilevel Models Testing Interactions Between Appraisals of Task-Competence and Robot Feedback on Changes in Threat-ID, Decisions to Kill, or Confidence (Expt.2) •

Table S20 .
Parameter Estimates for Multilevel Models Testing Interactions Between Appraisals of Task-Competence Relative to Self and Robot Feedback on Changes in Threat- ID, Decisions to Kill, or Confidence (Expt.2) Descriptive Statistics for Future Willingness to Trust, by Condition, for both Experiments 1 and 2 • Table S21.Descriptive Statistics for Future Reliance Intentions, by Embodiment Condition (Expt. 1) and Anthropomorphism Condition (Expt.2)

Table S2 Descriptive
Statistics and Correlations for Individual Differences in the Godspeed Questionnaire Series, by Embodiment Condition (Expt. 1) and Anthropomorphism Condition (Expt.2)

Table S4 Parameter
Estimates for Models Testing Interactions Between GQS-Anthropomorphism Appraisals and Robot Feedback on Changes in Decisions to Kill, or Confidence (Expt. 1)

Table S5 Parameter
Estimates for Models Testing Interactions Between GQS-Animacy Appraisals and Robot Feedback on Changes in Decisions to Kill, or Confidence (Expt. 1)

Table S8 Parameter
Estimates for Models Testing Interactions Between PAS-High Expectations and Robot Feedback on Changes in Decisions to Kill, or Confidence (Expt. 1) Decisions to Kill, or Confidence (Expt. 1)

Table S10 Parameter
Estimates for Models Testing Interactions Between Sex and Robot Feedback on Changes in Decisions to Kill, or Confidence (Expt. 1)

Table S11 Parameter
Estimates for Models Testing Interactions Between Drone Warfare Attitudes and RobotFeedback on Changes in Threat-ID, Decisions to Kill, or Confidence (Expt. 1) = 135.Multilevel models with all predictors and outcomes entered at Level 1, save for the between-subjects robot Embodiment variable at Level 2. All linear variables were standardized.Random intercept included to account for shared variance within participants; covariance matrices were unstructured.Robot

Table S12 Parameter
Estimates for Multilevel Models Testing Interactions Between GQS-Intelligence and Robot Feedback on Changes in Decisions to Kill, or Confidence (Expt.2)

Table S13 Parameter
Estimates for Multilevel Models Testing Interactions Between GQS-Anthropomorphism and Robot Feedback on Changes in Decisions to Kill, or Confidence (Expt.2)

Table S14 Parameter
Estimates for Multilevel Models Testing Interactions Between GQS-Animacy and Robot Feedback on Changes in Decisions to Kill, or Confidence (Expt.2)

Table S15 Parameter
Estimates for Multilevel Models Testing Interactions Between GQS-Likeability and Robot Feedback on Changes in Decisions to Kill, or Confidence (Expt.2)

Table S16 Parameter
Estimates for Multilevel Models Testing Interactions Between GQS-Safety and Robot Feedback on Changes in Decisions to Kill, or Confidence (Expt.2)

Table S17 Parameter
Estimates for Multilevel Models Testing Interactions Between Sex and Robot Feedback on Changes in Decisions to Kill, or Confidence (Expt.2)

Table S18 Parameter
Estimates for Multilevel Models Testing Interactions Between Drone Warfare Attitudes and Robot Feedback on Changes in Decisions to Kill, or Confidence (Expt.2)

Table S19 Parameter
Estimates for Multilevel Models Testing Interactions Between Appraisals of Personal Task-Competence and Robot Feedback on Changes in Decisions to Kill, or Confidence (Expt.2)

Table S20 Parameter
Estimates for Multilevel Models Testing Interactions Between Appraisals of Relative Task-Competence and Robot Feedback on Changes in Threat-ID, Decisions to Kill, or Confidence (Expt.2) = 423.Multilevel models with predictors and outcomes entered at Level 1, except between-subjects variables (Interactive Humanoid, Interactive Nonhumanoid) at Level 2. The Interactive Humanoid and

Table S21 Descriptive
Statistics for Future Reliance Intentions, by Embodiment Condition (Expt.1)and Willingness to trust the agent in the future was measured according to a version of the 10-item Reliance Intentions scale