Investigating the influence of agent modality and expression on agent-mediated fairness behaviours

With technological developments, individuals are increasingly able to delegate tasks to autonomous agents that act on their behalf. This may cause individuals to behave more fairly, as involving an agent representative encourages individuals to strategise ahead and therefore adhere to social norms of fairness. Research suggests that an audio smiling agent may further promote fairness as it provides a signal of honesty and trust. What is still unclear is whether presenting a multimodal smiling agent (by using visual and auditory cues) rather than a unimodal smiling agent as normally available commercially (using only an auditory cue e.g., Siri) could amplify the impact of smiles. In the present study, participants (N = 86) played an ultimatum game either directly with another player (control), through a smiling multimodal and unimodal agent or through a neutral multimodal and unimodal agent. Participants’ task was to offer a number of tickets to the other player from a fixed amount. Results showed that when playing the ultimatum game through a smiling multimodal agent, participants offered more tickets to the other player compared to the control condition and the other agent conditions. Hence, exploiting multisensory perception to enhance an agent’s expression may be key for increasing individuals' pro-social behaviour when interacting through such an agent.


Introduction
Autonomous agents, which can act on behalf on an individual, are becoming increasingly integrated into our day to day lives. These agents range from autonomous cars, drones, virtual assistants to social robots and are deployed in a myriad of contexts [22]. Research shows that individuals act differently when interacting with others via an agent than directly with others. For instance, individuals played more fairly and made higher offers, and were also less likely to accept unfair offers in an ultimatum game when programming an agent to act on their behalf compared to when directly interacting with the other player [23].
However, autonomous agents come in various designs. For instance, while virtual assistants like Siri and Cortana interacts with humans via speech [35], social robots are embodied and can also convey visual information such as text, gestures, and even facial emotions [43,50]. While no commercially available agents convey only visual information, when combined with auditory information as exemplified by social robots, visual information could significantly impact on user behaviours. Research suggest that the embodied nature of social robots may increase anthropomorphism [59] and empathy [39], which facilitates fairness behaviours in the ultimatum game [6]. Furthermore, embodied agents may also increase perceived social presence [42], which may strengthen the transmission of these social effects [5,44].
Agents are also able to induce affect. A study by Torre et al. [65] found that individuals would play more fairly in an ultimatum game if playing against an agent with a smiling voice compared to a neutral voice. While interaction through voice mirrors the interaction afforded by voice agents (e.g., Siri), what is uncertain is whether multimodal cues of smiling, echoing the mode of interaction as seen in social robots (e.g., ASIMO), could impact fairness. Existing literature suggests that incorporating congruent information across visual and audio modalities (e.g., happy facial expression with happy voice prosody or happy body movements/gestures with happy voice prosody) could increase the accuracy and speed of perceiving affect [18,20,30,54,57,67], which may amplify the effect of smiles on fairness behaviours. However, the effect of expressive audio-visual congruent information on agent-mediated fairness behaviours remains unknown.

Related works
The ultimatum game is a popular game used in behavioural economics to understand human decision making [51]. In the ultimatum game there is a proposer and a responder. The proposer receives a sum of tickets and can choose to offer an amount to the responder. If the responder accepts the offer, both parties receive the proposed allocation. Alternatively, if the responder rejects the offer, no one receives anything [51]. The ultimatum game thus raises a conflict between maximising self-gain and social motives such as fairness concerns [3,55]. Specifically, in the context of the ultimatum game, fairness could be demonstrated by offering more equitable offers (i.e., an even split of tickets [23]).
According to rational choice theory, a rational responder should accept any amount offered by the proposer as this will result in a gain [51]. However, research demonstrate that proposers generally offer fair amounts, and responders will reject unfair offers [51]. A recent study conducted by de Melo et al. [23] suggests that this fairness behaviour could be facilitated when players acted via an external agent. In the study, participants played an ultimatum game either by programming an agent to act on their behalf or directly with their counterpart. Results in de Melo et al. [23] demonstrated that participants who programmed a software agent to act on their behalf in the game played more fairly and made higher offers compared to participants that played directly with their counterpart. The authors suggested that using an agent would require individuals to make decisions in advance and strategise. As such, individuals may rely on social norms, in this case norms pertaining to fairness, to enable consistency in their decision making. These results also align with Domingos et al. [26], which uses the collective risk dilemma game, where individuals balance collective gain versus individual gain. When asked to program an agent to act on their behalf, individuals are shown to choose a more cooperative agent that focuses on collective gain to act on their behalf. In addition to having to think ahead and strategise, the authors suggested that the fear of being betrayed is decreased when acting via autonomous agents, which may contribute to the increased cooperative behaviour.
When considering agent-mediated behaviours, it is important to consider the context where the behaviours take place. For instance, if responders accepted lower offers, proposers would learn to offer lower amounts [4]. However, Mussel et al. [49] found that responders were more likely to accept offers when playing against a proposer with a smiling expression, and this effect also extended to when the proposer offered an unfair amount. The emotions as a social information model [69] suggests that individuals utilise emotive cues to comprehend ambiguous situations. In a cooperative context, displaying positive affect by smiling may imply that the other party has something to gain, thus eliciting trust and cooperation [69]. In a study by Centorrino et al. [15], participants were asked to play an investment game. In each round of the investment game, senders could choose to send a sum of money to the trustee, which would be tripled. The trustee could then decide to return or not to return an amount to the sender. Following the game, senders rated genuineness of the smiles and trustworthiness of the trustees. Results show that the rated genuineness of the smiles positively predicted the perceived trustworthiness of the trustees and willingness of the senders to offer them a higher stake. This may be because a smile may convey the expression of positive affect and provide a signal for trust [62]. In addition to the ultimatum game and the investment game, smiles are shown to increase cooperation in the prisoner's dilemma [58], dictator games [13] and trust games [66], suggesting that smiles are overall effective in increasing cooperation regardless of the game involved.
As autonomous agents have the potential to interact with humans, it is important to establish mutual trust and positive social relations through these interactions [64]. Research suggest that smiling agents are perceived to be more likeable [14] and trustworthy [19]. In addition, individuals report more enjoyment and satisfaction when interacting with a smiling agent [32]. Recently, in a study by Torre et al. [65], participants played an investment game against a virtual voice agent that had either a smiling voice or neutral voice. Participants who played against the agent with a smiling voice offered more to the agent than those who played against a neutral voice agent. Individuals who played against the smiling agent kept offering significantly more overtime even when the agent was displaying unfair behaviours. The unfairness of the agent indicated that it was no longer a cooperative context, suggesting that smiling may continue to elicit cooperation regardless of context.
The study by Torre et al. [65] does not account for the effects of smiles expressed through multimodal channels. In multimodal interface design, it is possible that the information conveyed across different modalities are congruent (expressing the same affect) or incongruent (expressing different affects [45]). While research suggests that individuals would more accurately recognise emotions when expressed congruently across different modalities in humans [18,20,30,54,57], this may be the case for agents as well. Tsiourti et al. [67] found that participants were less likely to correctly recognise the emotional expression of a social robot when there was incongruent information across visual and auditory modalities. As such, the effect of smiling on fairness behaviours may be more pronounced in audio-visual-congruent compared to audio only conditions in autonomous agents.
The embodied nature of multimodal agents may also introduce other factors that may influence fairness behaviours. For instance, multimodal agents may increase perceived anthropomorphism. This is defined as the tendency to attribute human characteristics to non-human objects or agents [31]. Epley et al. [29] suggests that the tendency to anthropomorphise objects is influenced by the object's physical similarities to oneself. In a study by De Kleijn et al. [21], when playing against different robots in an ultimatum game, participants' rated levels of anthropomorphism for the robot positively correlated with preference for fairness in the game. As such, a multimodal agent, which conveys significantly more human-like traits may increase agent anthropomorphism and subsequently increase fairness preferences. Barraza and Zak [6] found that participants offered a larger sum in the ultimatum game when primed with an empathy-inducing video. Furthermore, the size of offers positively correlated with participant's self-reported empathy towards the video. Similarly, Page and Nowak [52] found that if players offered what they themselves would accept, this causes others to eventually demand a fair split of the total sum, leading to the development of fairness. In short, this suggests that empathy induces fairness in the ultimatum game. This is important because Kwak et al. [39] found that individuals empathised more with a multimodal embodied robot compared to a unimodal disembodied robot, possibly because empathy is shown to be positively influenced by anthropomorphism [59]. Hence, employing an embodied agent may increase fairness behaviours in the ultimatum game.
Finally, multimodal agents may increase social presence [42], defined as the perception of another agent as a social actor [9]. Heerink et al. [33] found that scores of social presence whilst interacting with a robot was positively correlated with the intention to use. In conditions of high social presence, individuals also reported higher enjoyment interacting with the robot, perceive the robot as more helpful [44] and were more likely to comply with an unusual request [5]. These findings suggests that multimodal agents may facilitate social presence, which in turn could enhance the transmission of social cues and strengthen the effect of these cues on fairness in the ultimatum game. There was an additional control condition without involvement of an agent where participants would play directly with the responder

Purpose of the present study
The aim of this study was therefore to examine whether agent modality (audio only and audiovisual) and expression (smiling or neutral) would impact on agent-mediated fairness behaviours. The reason for focusing on audio only and audiovisual conditions rather than including also a visual only condition was driven by available and commercialized agents such as Siri and the applied aim of our research as a visual only agent is rarely experienced in real life scenarios. Based on results from de Melo et al. [23], it could first be hypothesised that participants would be fairer and offer more tickets to responders when acting via an agent (both neutral or smiling) compared to participants who play directly with responders (control condition). Second, as smiling could consistently elicit cooperation regardless of context [65], it could be hypothesised that participants in the smiling agent conditions would make higher offers to responders in comparison with participants in the neutral agent conditions. Lastly, within the smiling agent conditions, it could be predicted that participants in the multimodal versus unimodal agent condition would offer more to responders.

Design
This study employed a mixed factorial design (for the different agent conditions), with an additional control condition (no agent). The within-subjects independent variable was agent modality (unimodal vs multimodal). The betweensubjects independent variable was agent expression (smiling vs neutral). The dependent variable was the sum of tickets offered by participants to responders. The conditions of the study are summarised in Table 1.

Participants
Fluent English speakers (N = 98) older than 18 years old with normal or corrected to normal vision and hearing participated in the study. However, participants who did not complete the consent form (N = 10) and statistical outliers (N = 2) were removed (see Analysis section on page 7 for details on how this was defined), resulting in 86 participants in total. Of the sample, 16 were males, 69 were females and 1 preferred not to say, this breaks down to a gender distribution of 13 females and 3 males in the control condition, 23 females and 10 males in the neutral condition, and 33 females, 3 males and 1 preferred not to say in the smiling condition. Participants were on average 22.1 years old (SD = 7.7). They were recruited via word of mouth or in exchange for course credits. Otherwise, participants were told that the tickets they gained in the game could be put towards a lottery to win a £25 amazon gift card, like in de Melo et al. [23]. This study received full ethical approval from the Psychology Research Ethics Committee (PREC) of the University of Bath (ref. UG 20-095).

The game
An ultimatum game that participants could play on their browser was created. This was developed using Unity, a game engine, and hosted on itch.io, an online distribution platform for independent game developers to publish their own games.
In the first scene of the game, participants would enter their participant ID and press play to start. Participants were then shown a text description of the rules of the game and could press continue to begin playing. Then, they were directed to a screen that said "connecting to server" for a random duration of between 20 and 30 s to build the illusion that they were connecting to another player. When "connected", participants believed that they were playing against player B, which was actually a programmed bot that rejected participants' offers in predefined probabilities. In de Melo et al. [23], participants were given 20 tickets in each round and offers greater than or equal to 10 tickets were always accepted, offers of 8-9 tickets were accepted 75% of the time, offers between 4 and 7 tickets 25% of the time; otherwise, it was rejected. However, automatically accepting offers from 10 or above could cause participants to learn that they could repeatedly offer an even split or more for a consistent gain, negating any effects of experimental manipulations. As such, a stricter acceptance criterion was adopted and revised as follows: offers larger than or equal to 16 were always accepted; offers between 10-15 were accepted 75% of the time, offers between 5 and 9 were accepted 50% of the time, offers between 2 and 4 were accepted 25% of the time, in the case of 1, the offer was rejected.
The use of a bot in place of a human is a common method used in many studies comparing human-human versus human-computer interaction to increase experimental control [24,60]. Research suggests that players' decisions in the ultimatum game may be influenced by reciprocity, meaning that there are higher acceptance rates in higher offers [10], and higher rejection rates in lower offers [71]. As such, keeping the responder's acceptance rate constant ensures that any differences in participants' offers are not due to participants reciprocating the responder's actions.
On the screen, there was a line of buttons where participants could express their offer to player B (see Fig. 1). After expressing their offer (see Fig. 2), participants were told whether player B had accepted or rejected their offer (see Fig. 3), and another round began. Player B's offer was similarly delayed by a random duration of between 5 and 20 s to create the illusion that player B was deciding. Participants always played as the proposer. Participants played a total of 2 sets of games, with 10 rounds in each set. They were told that they were playing against a different player B in each set and were similarly taken to the "connecting to server" screen before starting the new set.
Participants were randomly allocated to either the smiling, neutral or control condition. Participants in the control condition played the game with textual instructions on the screen and with a player B icon as the responder (see Figs. 1, 2, 3). This is similar to the game interface in de Melo et al.'s [23] study. For the rest of the participants, the area of player B's icon was replaced by an agent with a smiling or neutral expression. Furthermore, in each of those conditions, participants were exposed to both a unimodal and a multimodal agent to account for the individual differences in multimodal perception [56]. The order with which each participant underwent the unimodal and multimodal condition was counterbalanced (e.g., one participant did the first set of the game with the unimodal agent and then the second set with the multimodal agent while the next participant did the opposite).
Agent appearance For the unimodal agent, a simple sound wave was chosen. The sound wave was taken from an online model [12] and modified. The wave moved in sync with the audio instructions played in the background to provide participants with signs that the agent was speaking. In existing voice agents such as Siri, Cortana and Alexa (refer to Appendix A for images), visual cues of speaking are displayed which is why we had also simple visual information alongside the voice agent rather than having no visual information at all. The voice wave was in a light blue colour, which is the colour adopted by Cortana's and Alexa's pulsing circular visual cue (Fig. 4). While adopting similar design features to existing voice agents may increase the applicability of our results, the sound wave still conveys some visual information and thus may not be strictly unimodal. However, the sound wave lacks human-like characteristics and a body representation. This decreases the potential levels of empathy [29], anthropomorphism [21] and social presence [42] experienced  Game interface in the unimodal agent condition. Note: this was identical to the game interface in the control condition, except for the icon of Player B, which was replaced with the visual sound wave (called here unimodal agent). The text was also removed from the interface as it was played in the background as audio Game interface in the unimodal agent condition For the multimodal agent, a free online model [70] was used and modified (see Fig. 5). This model was chosen due to its similarities with commonly seen social robots such as REEM and ASIMO (refer to Appendix A for images). The model was animated to have blinking eyes to bring it some liveliness and increase the level of anthropomorphism. For the mouth, the same voice wave of the unimodal agent was used, which similarly moved in sync to the audio instructions played in the background.
For the smiling condition, the agent adopted a smiling face when the instructions finished (Fig. 6). For the neutral condition, the agent adopted a neutral face when the instructions finished (Fig. 7).

Agent voice
The majority of software agents employs a female voice. In addition, individuals usually set the language of their phone to their native language. As this research took place in the United Kingdom, a predominantly Englishspeaking country, a British female who spoke standard southern British English was chosen to be the voice of the agent. In Torre et al. [65], researchers exposed the voice actors to amusing and happy stimuli and asked the actors to smile when recording the voice for the smiling agent. For the neutral agents, researchers asked the actors to utilise their normal speech. The same instructions were therefore given to the British female voice actor during the voice recording of the smiling and neutral agent.
The phrases recorded facilitated the participant's actions in the game. For instance, the agent would ask what the participant would like to offer (i.e., "How many tickets would you like to offer to player B?") or would describe the consequences of the participant's offer (i.e., "you have offered x tickets to player B, player B is deciding…player B has rejected/accepted your offer"). This created the impression that the agent conveyed information between the participant and player B.
The numbers of tickets from 0 to 20 were recorded and used in the phrases in the game corresponding to the participant's offers and the tickets they gained in that round.

Procedure
Participants were sent a link to a start survey, where they were given information about the study, and were asked to Smiling agent when no instructions were playing Neutral agent when no instructions were playing Fig. 7 Neutral agent when no instructions were playing. Note: While the mouth of the agent here is black to make it more visible in the figure, the agent had a light blue mouth in the actual game to mimic the colour used in the unimodal (sound wave) condition and in commercially available voice assistants provide informed consent. At the end of the survey, they were randomly allocated to either the control, smiling or neutral condition and a participant ID was generated for them. They were provided with the link to the game and were asked to access the game in a quiet location and to enter the ID to start the game (see material). Once participants completed the game, they could click the finish button on the screen that saved their responses and redirected them to an end survey. This debriefed the participants, providing them with information on the background and aims of the study, the different conditions and how they were allocated to their condition. Furthermore, they were provided with links to related works and contact details of the researchers if they had any questions. For participants that did not receive course credits, the debrief also provided them with the option to enter a lottery if they wished to. Finally, participants were thanked for their participation and could end the study by closing their browser.

Analysis
The acoustic features from the voice recordings were first analysed to ensure that a difference in affect was indeed present. Torre et al. [65] measured the fundamental frequency, the average frequencies of the first three formants and the spectral centroid of their neutral and smiling speech recordings and found that all these characteristics were significantly higher for the smiling than neutral speech. However, consistent evidence on the differences between smiled and neutral speech is only found for F0 [7,41]. Regarding formant patterns, smiling in voice may be characterised by a general rise in the first three formants [28], or an increase in only F2 [7]. Alternatively, Drahota et al. [27] suggests that it is the spread between F1 and F2 and between F2 and F3 that influences perception of smiles in speech. Lastly, whilst smiled speech is found to correlate with higher spectral centroid [2], the evidence on this is limited. As such, whilst the same audio features to Torre et al. [65] will be analysed, only F0 characteristics will be used as the main proxy differentiating smiles and neutral speech. However, the analysis for the spectral centroid and the average frequencies of the first three formants are included in Appendix B for reference.
As such, the same audio features of the smiling and neutral recordings were compared. Comparisons were conducted with paired samples t-tests, except for the data of the spectral centroid for the smiling (W (42) = 0.69, p < 0.001) and neutral clips (W (42) = 0.63, p < 0.001) which significantly violated assumptions of normality, so a related samples Wilcoxon signed rank test was used instead.
Statistical outliers were identified as data points exceeding 1.5 times the interquartile range. However, this may be due to individual differences in fairness [51]. Therefore, of the statistical outliers, the offers participants made in each round were analysed, and data points were only removed if the same offers were given for more than half of the rounds, as this indicated that participants were not playing the game seriously. As a result, two outliers were removed. Due to technical issues with the survey randomisation, there was a slight skew in the sample size for each condition, resulting in 16, 33 and 37 participants in the control, neutral and smiling conditions respectively.
The average, minimum and maximum tickets offered by participants in each condition was calculated. For participants in the control condition, an average was taken across tickets offered in both sets of the game. For the rest of the participants, average tickets offered in each set was taken, corresponding to the multimodal and unimodal agent conditions. Mean tickets offered by participants in each of the agent conditions were compared to tickets offered in the control condition with a one samples t-test. This is to test whether tickets offered in the neutral and smiling conditions differed from the control, corresponding to the first and fourth hypothesis. A Shapiro Wilk test demonstrated that data in the smiling unimodal condition (W (37) = 0.93, p = 0.020) slightly deviated from normality. However, Bartlett [8] suggests that one samples t-tests are robust to mild violations of normality, therefore, one samples t-tests were still used for analysis. To account for errors in multiple comparisons with the control condition, a Bonferroni correction was applied. To test for the second and third hypothesis, a mixed factorial ANOVA was carried out, this enabled main effects of expression and modality, and the interaction effect between the two factors to be tested. Further, Krys et al. [38] found that smiling individuals were rated as being more honest by females than males, indicating that gender may impact on how smiles are perceived. Due to only one participant reporting "prefer not to say" as a gender choice in the smiling group we excluded this participant and ran the same mixed factorial ANOVA with gender as covariate to examine its effect on the number of tickets offered by participants.

Results
A paired-samples t-test showed that as expected the fundamental frequency of the audio recordings were significantly higher in the smiling (M = 239. 28 18. For details about the other formants and spectral centroid results see Appendix B. The result for F0 demonstrates that the stimuli used for the present study did effectively differentiate between smiling and neutral speech. To examine whether having an agent in the ultimatum game would result in more tickets offered by participants to the other player, we compared all the four agent conditions (neutral unimodal agent, neutral multimodal agent, smiling unimodal agent, smiling multimodal agent) to the control condition which had no agent mediating the participants' offers. For a summary of descriptive statistics, see Table 2. A mixed factorial ANOVA with agent expression (smiling, neutral) as between-subjects factor and agent modality (audio, audiovisual) as within-subjects factor was then carried out to examine whether the presence of smiling cues and of both visual and auditory information would significantly increase the number of tickets offered by participants to the other player. We found no significant effect of agent expression (F(1, 68) = 1.09, p = 0.299, ηp 2 = 0.016). Furthermore, no main effect of agent modality was found (F(1, 68) = 1.21, p = 0.275, ηp 2 = 0.017). However, the analysis returned a significant interaction between agent expression and modality (F(1, 68) = 4.31, p = 0.042, ηp 2 = 0.060). Pairwise comparisons, Bonferroni corrected, showed that tickets offered by participants in the neutral unimodal and neutral multimodal condition were not significantly different (p = As at least one participant in each condition offered the lowest (0) and highest (20)

Discussion
This study investigated whether the modality and expression of software agents could impact software agent-mediated fairness behaviours using the methodology by de Melo et al. [23], but with the extension of allowing participants to play the ultimatum game via an agent that differed in expression and modality. Results show that tickets offered by participants in the control condition were similar to tickets offered in all the agent conditions except the smiling multimodal condition. Furthermore, a smiling voice agent did not increase the number of tickets offered by participants, and similarly, a multimodal neutral agent did not result in more tickets given to responders compared to a unimodal agent. However, a combination of a smiling and multimodal agent did significantly increase tickets offered by participants to responders. The lack of difference in tickets offered by participants in the control versus neutral agent conditions indicates that the first hypothesis should be rejected. The first hypothesis predicted that participants in the neutral agent conditions would offer more tickets compared to the control, this is because de Melo et al. [23] demonstrated that acting via an agent may cause participants to deliberate over their actions and thus act more fairly. The lack of difference between the neutral agent and control conditions suggests that participants may not have significantly considered their actions. This may be due the level of participant's interaction with the agent throughout the game. In de Melo et al. [23], participants were able to program the agent in advance, meaning that they expressed their acceptance thresholds prior the game began. However, the current study allowed participants to decide their offers after each round mediated by the agent. This may have provoked a short-term style of decision making and cause participants to not take long term consequences into consideration.
On the other hand, the fourth hypothesis predicted that participants in the smiling condition would offer more tickets than participants in the control. This hypothesis is partially supported by the findings of the current study as a significant increase of tickets offered was found in the smiling multimodal condition compared to the control. This is because smiling may further induce cooperation when acting via an agent [65]. However, the lack of impact from the unimodal smiling condition suggests that this finding may be attributed to the multimodal nature of the agent in conjunction to the smiling expression. This because a multimodal smiling agent significantly increased the number of tickets offered to player B not only compared to the control condition with no mediating agent but also compared to all the other agent conditions. This may be explained by results in Collignon et al. [18], who demonstrated that participants are able to categorise emotions quicker and more accurately with audio-visual than audio or visual only stimuli. It is thus possible that participants in the unimodal smiling condition did not perceive the smiling nature of the agent sufficiently due to the scarcity of signals communicated through audio alone, as such, it was not effective in significantly change participants' fairness behaviours. On the contrary, the richer information conveyed in the multimodal smiling condition both audibly and visually, strengthened the accuracy and speed of perceiving smiles [18] and subsequently increasing the tickets offered by participants. However, we cannot be sure that both audio and visual smiling expression contributed to the significant effect of the smiling multimodal agent on participants' behaviour. It is possible that participants did not combine information across both visual and auditory modalities equally to interpret affect in the multimodal smiling condition. Previous studies (e.g., [18]) suggest that in the case where one of the cues has low reliability, individuals would mostly rely for their judgments on the information with higher reliability. As the multimodal smiling agent had a visual smile, it could be argued that this conveyed clearer information of a "smile" compared to the possibly ambiguous audio modality, which may have contributed to the increased fairness demonstrated in the multimodal smiling condition. This, coupled with the lack of a visual-only condition in our study (dictated by the focus on the effect of available software agents on participants behaviours), may limit the interpretation that fairness behaviours are enhanced by the multimodal nature of the smile, as the visual smile only may have enhanced fairness behaviours. This, nevertheless, forms a complementary finding to Torre et al. [65] and suggests that mediating agents conveying smiles through either voice prosody or facial expressions have positive effects on fairness behaviour.
The lack of effects of a smiling unimodal agent on participants' fairness behaviours raises an interesting issue as Torre et al. [65] similarly employed a smiling voice agent and found that it significantly increased fairness behaviours in participants. While this study analysed the same audio characteristics to Torre et al. [65], we only found a significant increase in F0, but no significant increase of the average frequencies of the first three formats and spectral centroid as in Torre et al. [65]. It is possible that the lack of impact of smiling unimodal agent could be attributed to the difference in audio stimuli between our study and that of Torre and collaborators. Additionally, while, as expected [65], we found a significant difference between smiling and neutral voice condition for the fundamental frequency, we cannot be sure that this resulted in differences in perceived expression, as the latter was not measured. As such, it remains unclear whether participants actually perceived smiling in the unimodal smiling condition, which may have contributed to the similar levels of fairness between the neutral and smiling unimodal condition.
The current study also did not measure the user experience of participants when interacting with the agent. As mentioned prior, individuals may find interacting with a smiling agent more enjoyable and satisfying [32], and smiling agents may be seen as more likeable [14] and trustworthy [19], and it is uncertain the extent to which this could have influenced fairness behaviours in the participants in the current study. In light of these considerations, there is an opportunity to further validate the potential impact of combining both visual and audio smiles on fairness behaviours. While the current study did not include a visual-only agent condition as it does not reflect the most commonly available agents, there is value in future research to include a visual only condition to further understand multisensory integration in autonomous agents, and how this affects fairness behaviours. In addition, it is important to measure the perceived effect of the audio stimuli to validate participants' accuracy in categorising the affect expressed as well as users' perceptions of the smiling agent to help isolate the factors that contribute to fairness behaviours.
On a similar note, there remains interpretation regarding what a multimodal agent is. In line with the literature from Collignon et al. [18], it is possible that providing an agent with just a face would be sufficient to improve the accuracy and speed of affect perception, and that a body would not be required unlike what was implemented in the current study. However, works by Meeren et al. [46], Van den Stock et al. [68] and Piwek et al. [57] demonstrate that affect-congruent body expressions also contribute to increasing the accuracy of affect recognition. Full body information was introduced in the current study as it may improve empathy, anthropomorphism and social presence. In fact, the multimodal agent used here may have also increased social presence [42], which facilitates the transmission of social cues, amplifying the impacts of smiles on fairness. Furthermore, due to the embodied agent presenting increased human-like characteristics [29], participants may have further anthropomorphised the smiling agent and thus increase their fairness in the game [21]. Lastly, an embodied robot may have increased empathy in participants via inducing anthropomorphism [39], thus increasing tickets offered in the game. Future research could investigate the separate and joint contribution of different cues such as facial expressions, body gestures and movements as well as voice prosody on agent effectiveness towards enhancing prosocial behaviour and fairness.
Additional limitations of this study also need to be discussed. Participants could either receive course credits for their participation or had a chance to enter a lottery with the tickets they have gained in the game. Although this study assumed that all participants were incentivised to gain as many tickets as possible, participants entering the lottery may have been more motivated to maximise their number of tickets as it would contribute to a bigger chance of winning the lottery. On the contrary, participants gaining course credits may have been less motivated to obtain an higher number of tickets as these had no effect on the number of course credit they were given. This difference in incentive may have created variations in fairness behaviours within participants. Second, due to a technical issue the study assigned a different number of participants to the different conditions. The imbalance in sample size among the different conditions made the study underpowered thus limiting the study conclusions. Another limitation is that there may be individual differences in how smiles were perceived. Unlike Krys et al. [38], which suggested that smiling individuals were rated as more honest by females than males, the analysis did not reveal any gender differences in the number of tickets offered in the agent conditions. However, the results could be limited due to the significant skew of the gender distribution in the sample (16 males, 69 females), leaving unclear whether gender could have an effect. This suggests the opportunity to replicate the study with a more even gender distribution to further validate the effects of smiling when considering individual differences. Lastly, participants in the current study only played as the proposer so it remains unclear how multimodal smiling agents could affect participants if they played as the responder. Unlike proposers, responders also display altruistic punishment, whereby they reject unfair offers as means to punish proposers while sacrificing their chances of gaining anything [61]. Hence, further studies could examine whether the effect of expressive cues such as smiling, and their multimodal representation (compared to unimodal) would affect fairness behaviours similarly when participants play as proposer and as responder.
Finally, it is important to discuss the ethical implications of this study, as this suggests that smiles could influence an individual's decision making. Within the study, it is possible that participants may have felt guilty when they offered less than what is conventionally accepted as fair, as the smile from the agent may have cued them to act fairly. While it may be helpful to facilitate cooperation and fairness from the individual, it may not always be in the individual's interest to act fairly such as when it is not reciprocated. In a broader context, this links to the wider ethical consideration of using autonomous agents to nudge human behaviour [40]. When making decisions, humans account for multiple factors such as their values, beliefs, culture, however, there remains challenges in developing autonomous agents that behave similarly and that could adequately represent human interests [16]. This suggests that there should be consideration to where smiles could be implemented in autonomous agents in the context where it could impact individual outcomes. For instance, customer service robots could be designed to be smiling to increase cooperation and fairness in customers [48], or smiles could be implemented in healthcare social robots to increase compliance in taking medication [43]. However, there is uncertainty as to whether implementing smiles in shopping assistants could be appropriate [11,47], as this may influence an individual's financial outcomes and general health if they do not act in their best interests.
Despite the limitations discussed, the current study extended the existing literature on multimodal affect perception and investigated it in the novel context of agent mediated behaviours. The results demonstrate that a visual smile may facilitate fairness in individuals and initially suggests that multimodal-congruent information may increase affect recognition in agents similar to humans. In addition, this study demonstrates that implementing an agent displayed on a screen may be sufficient, triangulating with results from Li [44] and Thellman et al. [63], suggesting that a physical robot may not be required for the expected results. As a virtual agent requires considerably less hardware than social robots, it could be more easily deployed and a more cost-conscious option [36]. This may be preferred in digital healthcare and telemedicine, where deploying autonomous agents are becoming popular due to the increased demand for accessible and cost-efficient patient care [25].
To conclude, this study investigated the impacts of software agent expression and modality on participants' fairness in agent-mediated behaviours with an ultimatum game. Whilst a multimodal smiling agent was found to increase participant's fairness behaviours, there remains uncertainty in how participants integrated and perceived multimodal affect from the agent, which could open future avenues for research. This may be important, as results could be applied in informing the design of software agent representatives and furthers the understanding of mechanisms affecting the difference in behaviours between agent-mediated and direct interactions.

Conflict of interest The authors declare no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.