The Effect of Informing Participants of the Response Bias of an Automated Target Recognition System on Trust and Reliance Behavior

Objective To determine how changing and informing a user of the false alarm (FA) rate of an automated target recognition (ATR) system affects the user’s trust in and reliance on the system and their performance during an underwater mine detection task. Background ATR systems are designed to operate using a high sensitivity and a liberal decision criterion to reduce the risk of the ATR system missing a target. A high number of FAs in general may lead to a decrease in operator trust and reliance. Methods Participants viewed sonar images and were asked to identify mines in the images. They performed the task without ATR and with ATR at a lower and higher FA rate. The participants were split into two groups—one informed and one uninformed of the changed FA rate. Trust and/or confidence in detecting mines was measured after each block. Results When not informed of the FA rate, the FA rate had a significant effect on the participants’ response bias. Participants had greater trust in the system and a more consistent response bias when informed of the FA rate. Sensitivity and confidence were not influenced by disclosure of the FA rate but were significantly worse for the high FA rate condition compared with performance without the ATR. Conclusion and application Informing a user of the FA rate of automation may positively influence the level of trust in and reliance on the aid.

Keywords: human-automation interaction, defense, underwater mine detection, military psychology Naval mines are low-cost, easily deployed challenge to maritime security (Ho et al., 2011;Pavlovic et al., 2012). Naval mines have been waterspace with widespread impact on commercial and military maritime activities. Many navies have developed mine countermeasures (MCM) capabilities to detect, localize, identify, and neutralize a mine threat. The detection and alizations collected with high-frequency sonars mounted on specialized ships or autonomous underwater vehicles (AUVs). Although these sensors provide visualizations with excellent resolution, distinguishing a mine from other tems also generate a substantial amount of data for an operator to parse. To aid operators with analyzing the large volumes of data, automated target recognition (ATR) systems were introduced. For MCMs, ATR algorithms evaluate the side-scan sonar visualizations for objects that cessing by the operator (Ho et al., 2011;Kessel & Myers, 2005;Myers, 2009).
Though ATR systems have the potential to tems typically operate using a high sensitivity of missing a mine (parameters of operational systems are not publicly releasable). A liberal decision criterion is often desired in a wide where the cost of missing a target is high. Unfortunately, the number of false alarms (FAs) that occurs as a result tend to lead to a decrease in the trust and reliance/compliance the operators have in the aid (Geels-Blair et al., 2013;Kessel & Myers, 2005;Kessel, 2005;Rice & McCarley, 2011). FAs are more detrimental than misses due to their increased cognitive salience (Rice & McCarley, 2011). In the case where a user is in charge of monitoring both the automation and the raw data, frequent FAs may the raw data to determine if the alert is correct in identifying a signal as present (Dixon et al., 2007). In addition, a high FA rate may also lead Breznitz, 1983;1989), where users ignore the alerts over time, even when they may be correct (Allendoerfer et al., 2008;Parasuraman & Riley, 1997;Satchell, 1993; ). Related to the FA enced by the positive predictive value (PPV) of the automation (Huegli et al., 2020;Manzey et al., 2014). The PPV is the probability that a signal is present when the automation says a signal is present. Operator compliance with the automation decreases with lower PPV (Bartlett & McCarley, 2017, 2019Huegli et al., 2020). behavior include trust in automation (Dzindolet et al., 2003; ; Lee & Moray, 1992, 1994Lee & See, 2004) and user self- Lee & Moray, 1994;et al., 2000;Madhavan & Wiegmann, 2004). In circumstances where the error made by the aid is obvious and detected by the user, trust and reliance may be undermined. This may be due to the user believing that they can perform better than the automated system (Madhavan et al., 2003pleted manually (Dzindolet et al., 2002Lee & Moray, 1992, 1994Parasuraman & Riley, 1997 -using automation. Any changes that may occur in the level of trust the user has in the auto-gested to correspond to an associated change in automation use (Hussein et al., 2020;Lee & Moray, 1994).
Given the high expectations operators often have of automated systems (Ross et al., 2007), properly calibrating operator expectations with ability level tends to result in more appropriate reliance on and monitoring of the aid (Bagheri & Jamieson, 2004). Dzindolet et al. (2003) found that uninformed operators developed unrealiswhereas operators who were given information regarding the reliability of an automated system were able to develop more appropriate levels of trust in and reliance on the system (Bagheri & Jamieson, 2004;Du et al., 2020;Reiner et al., 2017;Wang et al., 2009). This research shows ability of an automated system, but it does not biases and disclosure patterns on the trust, reliance, and performance of users in an underwa-FA rate. Note that in operationalizing the experiment, we altered the response bias of the ATR, but rather than explaining changing response bias to our participants who may not have been familiar with signal detection theory, we informed participants about the change in the relative rates of FAs and misses. Thus, from here on, we refer to informing the users of the FA rate rather than the response bias.

Purpose
The purpose of this study was to deterown abilities, and mine detection performance whether informing the user of the FA rate of end, two groups of participants performed an an ATR system. The reliability and sensitivity of the aid was held constant across the experimental sessions, but the FA rate of the system changed part way through the study. One group of participants was informed of the change in the FA rate, and one group was not.
Hypotheses the automation would be higher during the low FA rate condition compared with the high FA rate condition and that trust would be higher when participants were informed of the automanot informed. Performance was expected to be worse for the high FA rate compared with low FA rate automation. The FA rate set (high vs. low) and whether the participant was informed of the FA rate (informed vs. not informed) were tional hypotheses were used for this measure.

Participants
Seventy adults (age: M = 20.13 ± 2.04 years; gender: 55 women, 14 men, 1 no gender speciwere recruited from the Dalhousie University community. Each participant provided informed consent, and received credit points for a psychology course and a performance bonus of 10CAD. Participants were told they would only receive the bonus if their mine detection performance was in the top 25%, but all participants received the bonus upon completion of the experiment.

Stimuli and Measures
Stimuli. Participants sat in front of a monitor and interacted with a simulation that diswas present in the sonar images. To ensure control over the number and type of mines in the images, the mines were synthetically injected based on a three-step process of sampling and characterizing the environmental noise of the image, applying that noise model to an ideal template of a mine image, and then inserting this noisy target template into the sonar data (Fawcett, 2017). This procedure ensured that physical aspects such as shadow length and specular highlight response were considered, and a reasonable noise distribution to ensure scene. The participants entered each of their where they believed a mine was present or by of the image labeled "NO MINE." rectangle appeared around a region on the sonar image if the system believed a mine was present ( Figure 1). Participants completed a Trust in Automation Questionnaire (Jian et al., 2000) following - Abilities Questionnaire was an adapted version of the Trust in Automation Questionnaire underwater mines in the sonar images (see Supplemental Materials for full questionnaires). The coordinates on the images where the par-MATLAB script to determine their response.
-tered on the true mine, it was recorded they had

Design
Participants completed four experimental and two with automation (100 trials each). In us to determine how performance changed over used a fewer number of trials due to the reduced complexity of not having the automated device.
was the added factor of the automation saying the mine was present versus absent overlayed with the mine actually being present or absent, was simply present or absent on each trial. The misses were adjusted for each of the FA rate conditions to ensure that the overall sen-bias changed. Participants completed one autorate and the other with the high FA rate. The order of presentation was counterbalanced between participants in each group (informed, not informed). For the low FA rate, 12% of the mine-absent trials had FAs: 6/50 of the trials where a mine was absent had FAs, and 11/50 of the trials where a mine was present had misses. For the high FA rate, 24% of the mine-absent trials had FAs: 12/50 of the trials where a mine was absent had FAs, and 5/50 of the trials where a mine was present had misses.
Participants were randomly assigned to one of two groups. The automated (not informed) group performed the automated experimental reliability of the system, and the automated (informed) group performed the automated FA rate/reliability of the system. Participants in the automated (informed) group were informed of the FA percentage via a script (see Supplemental Materials) before starting the tri-

Procedure
The participants were told that they would play the role of a naval commander aboard a warship about to sail in unchartered waters with an unmanned submarine to identify underwater ( Figure 2 for study design). The participants correct in determining if a mine was present or mine detection decisions during the experimen-During the automation introduction, the participants were informed that an ATR system had been designed to assist them in completscript explaining that the system uses computer vision to identify the presence of mines but is not entirely perfect and that the system could be altered to change the number of FAs that occur. Participants were then shown example sonar images that contained examples of a hit, miss, correct rejection, and FA by the experimenter. the participants in the informed group were told had been set to high or low. The instruction script contained the following text with the text For this mission, your commander has set the sensitivity of the device (high/ low). That means that it is expected that (24%/12%) of the trials the automation is going to have a false alarm where it says a mine is present but in fact there is no mine. However, this also means that the system may detect (fewer/more) mines.
All participants were told during the second mation was being re-calibrated and that they trials.
-Automation Questionnaire only following the

Data Analysis
Response time was calculated on each trial image appeared on the screen and when the par-Average response time was determined for the trials when a mine was present and was absent. The proportion of hits and FAs that occurred culated by dividing the number of hits and FAs determined by the number of trials in which a mine was present. The hit and FA proportions were then converted to z-scores within MATLAB using an inverse complementary error function and inputted into the following equation to calculate sensitivity:

−
(1) Using the calculated z-scores for hits and FAs, response bias values were also determined for equation: and High FA rate on trust in automation and operator response bias; therefore, a 2 (Group: Informed, Not Informed) by 2 (Automation Condition: Auto Low FA, Auto High FA) mixed ANOVA was performed on response bias and trust. For measures where sphericity was violated, Greenhouse-Geisser estimates were p 2 , was calculated for each isons were performed to follow up on any main reported.

Performance
Automation Condition on sensitivity; F(2.60, 176.6) = 68.9, p < .01, p 2 = .50 (Figure 3a). Mean sensitivity for No Auto 1 was found to three conditions (critical value = .11). In addition, mean sensitivity was also found to be sigcompared with No Auto 2. (Though the use of response bias and sensitivity provide a more nuanced analysis of performance than percentage correct, note that all participants performed better than chance and were on average 70%, 84%, 81%, and 82% accurate in the No Auto respectively, mirroring the sensitivity results.) Automation Condition on response bias, F(1, 68) = 17.7, p < .01, p 2 = .21; however, between Group and Automation Condition, F(1, 68) = 5.58, p = .021, p 2 = .08 (Figure 3b). In the Not Informed Group, mean response bias

DISCUSSION
The purpose of this study was to determine abilities, and performance during an underwater rate. Trust in the automation was greater for the participants who were informed of the FA rate compared with the participants who were uninformed of the FA rate. The response bias of the participants in the informed group remained relatively unchanged between the low FA rate and high FA rate automation conditions, where the response bias of uninformed participants was the high FA rate condition compared with when no automation was used later in the session (No Auto 2). Participant performance (speed and indicating that they still may have been learning the discussion will mainly focus on comparing the automation conditions with the second no Trust who were informed of the FA rate is consistent with previous research that has indicated that disclosing the reliability level of an automated aid Neyedli et al., 2011;Wang et al., 2009), or providing informa-  (Bagheri & Jamieson, 2004;Lee & See, 2004;Muir, 1994), should lead to more appropriate levels of trust and reliance. Since the participants did not have previous interactions with the ATR system, those who were not informed of the FA rate information may have had to determine the ability of the system by observing its actions and decisions (Bagheri & Jamieson, 2004).
Although informing the participants of the FA rate resulted in an increase in overall trust, and high FA rate automation conditions. Trust was expected to be lower in the high FA condition because FA errors may be more cognitively salient than miss errors leading to lower perceived reliability (Rice & McCarley, 2011). According to the literature, as the reliability of an automated aid decreases, a decrease in user trust and performance is expected (Dzindolet et al., 2003;Madhavan et al., 2003;Parasuraman et al., 2000). Since the overall sensitivity of the system and reliability this may suggest that trust is less sensitive to

Participant Response Bias
Participants who were not informed of the FA rate of the automation had a more liberal response bias for the high FA compared with the low FA condition, whereas the response bias for the informed group remained relatively unchanged. These results suggest that when the tion, which may be undesirable as the actual base rate of the probability of a mine did not change. This change in user response bias to mirror that of the automation may be attributed to automation bias, or the tendency for users to overly rely on information provided by an automated aid (Dzindolet et al., 2001(Dzindolet et al., , 2002Parasuraman & Manzey, 2010;Parasuraman & Riley, 1997). If a user is aware of the reliability to monitor the cues or information generated by the aid appropriately (Bagheri & Jamieson, 2004) and interfere with the system when they do not believe the cue generated is an appropriate response ( ). Furthermore, in situations where automation has a high FA rate, the user may develop a more conservative response bias to compensate for the sys-2013). Therefore, informed participants may (1) pay more attention to the cues generated by the automation, (2) be more cautious of the cues in the high FA compared with the low FA rate condition, and (3) intervene when the automated cues deviate from what they believe to be the best response. Over time, these behaviors may improve the performance of the human and automated team.

Sensitivity and Confidence
participants from both groups were found to be lower for the high FA automation condition compared with No Auto 2. According to the literature, when trust in automation exceeds a -ally (Dzindolet et al., 2002;Lee & Moray, 1992, 1994Parasuraman & Riley, 1997). Therefore, due to the high number of hits performed by the automation during the high FA condition, along mine detection abilities, the participants may have decided to simply rely on the aid rather than monitor the images and cues appropriately.
users had similar sensitivity scores and con-conditions. Though there appears to be a small for those in the informed group (Figures 3a  and 4b decisions.

Limitations and Future Research
As with most laboratory experiments, there are potential shortfalls when applying the results to an expert population. Compared with trained sonar operators, the individuals who participated in this experiment were not familiar with ATR systems or the detection of mines.  (2020) indicates that operators are particularly sensitive to the PPV of detector systems with low base rates. As such, PPV information may better calibrate operator-automation reliance over error rates.

CONCLUSION
The purpose of this research was to determine how changing the FA rate of an ATR in a mine mance and whether informing the user of the FA rate could mitigate the potentially detrimental When users were not informed of the FA rate of the automation, the number of FAs made by ior and reduced their trust in the system. In this experiment, when the reliability of the automaby the change in response bias. Sensitivity and FA rate condition compared with manual perautomation and to encourage the user to rely on the recommendations made by the system more appropriately, information should be provided regarding the number of FA errors that are expected to occur. In addition, designers should use caution when implementing systems with ACKNOWLEDGMENTS This research was funded through a Social Sciences and Humanities Research Council and Department of National Defence. The authors Development Canada and the NATO Centre for Maritime Research and Experimentation for the use of synthetic aperture sonar data in the prepa-

KEY POINTS
Automated target recognition (ATR) systems are algorithm-based systems developed to assist or replace human operators in detecting targets of interest. The purpose of this study was to determine if the FA rate of an automated target recognition system their own abilities, and mine detection perforand whether informing the user of the FA rate of Two groups of participants performed a simulated the help of an ATR system. The reliability and sensitivity of the system was held constant, but the FA rate of the system was set at a low FA rate or a high FA rate and changed part way through the study. One group of participants was informed of the change in FA rate, and one group was not. Trust in the automation was greater for the participants who were informed of the FA rate compared with the participants who did not receive any FA rate information. Response bias of the participants in the informed group remained relatively unchanged between the low FA rate and high FA rate automation conditions, where the response bias of the participants in the not informed group appeared to be or a high FA rate.
be lower for the high FA rate automation condition compared with when no automation was used. Informing users that the FA rate of the system can be changed may improve trust by providing more process-based information on how the automation operates. Designers should use caution when implementing systems with a high FA rate due to the detriperformance.

SUPPLEMENTAL MATERIAL
The online supplemental material is available with the manuscript on the HF website.