1 Introduction

Text-based conversational agents – commonly referred to as chatbots – are software systems designed to interact with humans using natural language (Dale 2016). Many organizations use chatbots to respond to customer service requests, provide personalized product information, or support customers in their purchase decisions (Sheehan et al. 2020). However, despite extensive interest in chatbots as a promising technology for customer interaction, their adoption and use by customers is growing much more slowly than expected (Grudin and Jacques 2019; Nordheim et al. 2019). A key reason cited for such slow adoption and limited use is that interacting with chatbots often does not feel natural and human-like (Schuetzler et al. 2014, 2020; Go and Sundar 2019).

A common way of rendering human–chatbot interactions more natural and human-like is to employ social cues in chatbot design. In the context of conversational agents, social cues are design features that trigger emotional, cognitive, or behavioral user reactions similar to those observed in the interaction between humans (Feine et al. 2019). Research has shown that users’ natural tendency to respond to these cues promotes social presence perceptions giving users a sense of personal, sociable, and sensitive human contact during the interaction, which in turn increases adoption and use (Hassanein and Head 2007; Hess et al. 2009; Qiu and Benbasat 2009). Research based on social response theory (SRT) (Nass et al. 1994; Nass and Moon 2000) has linked a broad range of social cues from chatbots, such as human-like avatars, small talk, or name tags, to positive user perceptions and behavior (e.g., Araujo 2018; Diederich et al. 2020; Benlian et al. 2020; Seeger et al. 2021).

However, the picture is less clear for response time – a vital social cue in technology-mediated interaction between humans (Walther and Tidwell 1995; Jacquet et al. 2019) and a key factor in website and mobile application usability (Galletta et al. 2006). According to the prevalent view in information systems (IS) literature, users do not readily tolerate slow response times of websites and mobile apps (Galletta et al. 2004; Yu et al. 2020). Nevertheless, some studies have found website delays to be beneficial, for example, in signaling the effort invested in generating product recommendations (Buell and Norton 2011; Tsekouras et al. 2022). In the context of chatbots, the role of response time is even less clear. Unlike human counterparts who need time to read a message and enter a response, chatbots can instantly process user input and generate a response (Schuetzler 2015; Følstad et al. 2018). Yet, some studies suggest that instant responses make a chatbot appear unhuman-like (Holtgraves and Han 2007), reduce the feeling of a natural conversation (Appel et al. 2012), and decrease user satisfaction (Gnewuch et al. 2018). In contrast, other studies show that chatbots with delayed responses elicit negative personality attributions (Holtgraves et al. 2007) and are perceived as less likeable (Schanke et al. 2021). Among practitioners there is also no consensus on this matter. While some designers intentionally delay chatbot responses to make them appear more human-like (e.g., Lufthansa’s Mildred; see Crozier 2017), others stress the importance of instant responses (e.g., SysAid Technologies 2019). Hence, this study sets out to disentangle the opposing effects of chatbot response time in the extant literature. We particularly focus on users’ prior chatbot experience as a factor that might explain how chatbot response time affects user perceptions.

Recent studies suggest that users transfer expectations from their prior experience with other chatbots to their current interaction with a chatbot (Moussawi et al. 2020; Grimes et al. 2021). Therefore, users with different prior chatbot experience are likely to have different expectations of chatbots in general and their response time in particular. For example, novice users who can only draw on their experience of chatting with humans might expect a longer response time than experienced users who have used other chatbots before. Consequently, a delayed response time may or may not violate users’ expectations depending on their prior chatbot experience. Since, according to expectancy violations theory (EVT) (Burgoon 1978, 1993), negative violations of expectations lead to negative outcomes, differences in users’ prior chatbot experience could explain the inconsistent findings on the impact of chatbot response time in the literature. Against this backdrop, our study draws on SRT and EVT to investigate the questions of (1) how chatbot response time influences users’ social presence perceptions and their chatbot usage intentions, and (2) how prior experience with chatbots moderates these relationships.

To address these questions, we conducted a lab experiment (N = 202) in which novice users (i.e., users who have not interacted with a chatbot before) and experienced users (i.e., users who have used chatbots before) interacted with a chatbot that responded either instantly or with a delay. In line with our reasoning, our results show that a delayed chatbot response time has opposing effects on social presence for novice and experienced users. While a delayed (as opposed to instant) response time positively influences novice users’ social presence perceptions, the effect is negative for experienced users. Further, we find that social presence mediates the effect of chatbot response time on usage intentions, and that this mediation is moderated by prior chatbot experience such that the indirect effect of a delayed response time on usage intentions is positive for novice users and negative for experienced users. Finally, our results show that prior chatbot experience moderates the effect of social presence on usage intentions such that the effect is stronger for novice users than for experienced users.

Our study contributes to IS literature in three ways. First, we extend SRT by identifying prior experience as a key moderating factor that shapes users’ social responses to chatbots. More specifically, by revealing opposing effects of a delayed response time for novice and experienced chatbot users, we shed a more differentiated light on SRT’s core assumption that social cues more closely resembling human behavior (e.g., a delayed response time) trigger social responses in all users, regardless of their individual characteristics. Second, we offer an explanation for inconsistent findings regarding the role of response time in the context of chatbots, websites, and mobile apps by introducing users’ prior experience as an important contingency factor. In particular, our study helps reconcile inconsistencies in the literature through clarifying the conditions under which a chatbot’s instant or delayed response times result in positive outcomes, namely when its users are experienced users or novices. Finally, while previous research has mostly focused on verbal and visual cues (e.g., human-like avatars), we extend this literature stream by considering the impact of response time – a cue that falls in the category of chronemic cues (Feine et al. 2019).

Our study offers several practical implications for individuals and organizations who design, develop, or implement chatbots and other types of conversational agents (e.g., voice assistants). First, our findings suggest that a “one-design-fits-all” approach could be one reason for the ongoing struggle to meet user expectations. Since expectations differ among user groups and appear to evolve as users gain more experience with chatbots, practitioners might explore chatbots that can be adapted based on user characteristics and preferences. Second, our study highlights the need for sensitivity to seemingly minor chatbot design features and their impact on user perceptions. Since these features are easily overlooked, practitioners could, for example, extend their development teams to include language experts who can contribute knowledge of human conversation.

2 Theoretical Foundations and Related Work

2.1 Chatbots

Chatbots are conversational agents that rely on natural language in the form of text messages (Følstad and Brandtzæg 2017; Xu et al. 2017). Although the first chatbot ELIZA (Weizenbaum 1966) was already developed in the 1960s, it was not until the 2010s that chatbots gained broader organizational interest (Grudin and Jacques 2019; Seeger et al. 2021). Beginning in 2016, the growing excitement regarding artificial intelligence (AI) led to the development of more than 300,000 chatbots on Facebook Messenger alone (Araujo 2018; Facebook 2018). Common application areas are customer service, e-commerce, and health care (Følstad and Brandtzæg 2017; Adam et al. 2020), as well as workplace employee support (vom Brocke et al. 2018; Mirbabaie et al. 2021). However, despite their growing availability, customers are slow in adopting chatbots (Følstad et al. 2018; Nordheim et al. 2019). As a number of high-profile failures have demonstrated, many chatbots were unable to live up to their promises and have disappeared (Ashktorab et al. 2019; Grudin and Jacques 2019). Consequently, researchers and practitioners alike have realized that chatbot design and development is not only a technical challenge, but also needs to consider elements known to influence human–human interaction (Jenkins et al. 2007; Følstad and Brandtzæg 2017; Pfeuffer et al. 2019).

2.2 Social Response Theory (SRT)

In human–human interaction, a person perceives, interprets, and responds to a wide array of social cues (e.g., facial expressions, gestures) (Burgoon et al. 2010). Based on the underlying Computers are Social Actors paradigm, SRT posits that users respond in a similar way to social cues from technology (e.g., natural language, human-like appearance) (Reeves and Nass 1996; Nass and Moon 2000). Even minimal social cues (e.g., a name tag) can trigger social responses in users (Nass et al. 1994; Nass and Moon 2000). These responses occur automatically and unconsciously without being “confined to a certain category of people” (Nass and Moon 2000, p. 98). When users respond socially to a technology, they unconsciously categorize it as a relevant social actor and ascribe human attributes to it, which enhances perceptions of social presence (Nass and Lee 2001).

Short et al. (1976) coined the term social presence to describe the extent to which a communication medium allows users to experience others as being psychologically present. It was originally defined as “the degree of salience of the other person in a mediated communication and the consequent salience of their interpersonal interactions” (Short et al. 1976, p. 65). However, its use has been extended since then to articulate how feelings of warmth, human contact, and sociability are created without actual human contact (Gefen and Straub 2004).

Building on SRT, various social cues from a host of different technologies have been linked to social responses in users (e.g., Hess et al. 2009; Qiu and Benbasat 2009). As for chatbots, researchers have primarily investigated verbal and visual cues (e.g., Araujo 2018; Moussawi and Benbunan-Fich 2020; Diederich et al. 2020; Seeger et al. 2021). For example, Moussawi and Benbunan-Fich (2020) showed that humorous comments can make chatbots appear more human-like. While these studies provide valuable knowledge on the impact of verbal and visual cues (e.g., human names, human-like avatars), less is known about other types of social cues such as response time (Feine et al. 2019).

2.3 Chatbot Response Time

Response time – also referred to as response latency or response delay – falls in the category of chronemic cues, which capture temporal aspects of communication (Walther and Tidwell 1995; Littlejohn and Foss 2009). It is an important social cue in human–human communication (Kalman et al. 2013; Schuetzler et al. 2019). In face-to-face interaction, response time is the time it takes a person to start speaking after another person has stopped. In technology-mediated interaction (e.g., instant messaging), it refers to the time it takes a person to respond to the other person’s message as well as the lag time between consecutive messages (Moon 1999). It includes the time needed to read, internalize, and make sense of another person’s message, as well as the time needed to craft and edit a response (Derrick et al. 2013).

In contrast to humans, chatbots can respond almost instantly, as they need only fractions of a second to process user input and generate a response (Følstad et al. 2018; Schuetzler et al. 2021). However, some scholars suggest that instant responses make chatbots appear unhuman-like, reducing the feeling of a natural conversation (Holtgraves and Han 2007; Appel et al. 2012; Schanke et al. 2021). Schuetzler (2015) argues that “it introduces a non-negligible feeling of artificiality to interact with something that can respond instantly to anything you say” (p. 50). Consequently, some researchers and practitioners delay chatbot responses. For example, Holtgraves and Han (2007) employed dynamic delays based on the current message’s number of characters (i.e., 50 ms per character), while Appel et al. (2012) used static delays of 15–30 s. Further, Lufthansa delayed the responses of their chatbot Mildred after customers complained about its instant response time (Crozier 2017).

Unfortunately, existing research on the role of chatbot response time is scant and presents inconsistent findings that warrant further investigation. Moon (1999) showed that medium response times lead to higher persuasiveness (compared to instant and long response times). Gnewuch et al. (2018) found dynamically delayed responses to increase perceived humanness, social presence, and satisfaction. In contrast, Holtgraves et al. (2007) found that instant response times lead to more favorable personality perceptions. Schanke et al. (2021) showed that a chatbot with dynamically delayed (70 words per minute) rather than instant responses yielded lower likability. Against this backdrop, we investigate whether differences in users’ expectations based on prior chatbot experience can reconcile the inconsistent findings in the extant literature.

2.4 Expectancy Violations Theory (EVT)

Expectancy violations theory (EVT) (Burgoon 1978, 1993) relates to the impact of nonverbal behavior violations in human–human interaction. Although originally developed to understand proxemic violations (e.g., distancing), EVT has subsequently been expanded to cover other forms of nonverbal and verbal violations (e.g., addressing others by their first names without permission, putting a hand on another person’s shoulder) (Burgoon 2009). More recently, research has shown that EVT also extends to human–computer interaction (Burgoon 2015) and can explain how violating expectations associated with conversational agents and robots influences how users interact with and evaluate them (e.g., Spence et al. 2014; Burgoon et al. 2016; Grimes et al. 2021).

In essence, EVT posits that people have expectations regarding the nonverbal behavior of human and non-human counterparts. When these expectations are violated, people shift their attention toward the source of the violation and attempt to assign meaning to the violation. For example, an embodied conversational agent (ECA) could have a human-like visual appearance (e.g., an interactive 3D avatar) but use a mechanical-sounding voice (Burgoon et al. 2016). Then, the ECA’s voice might violate the expectations users formed based on the ECA’s visual appearance. In such a situation, users attempt to make sense of and interpret the violation, asking why the ECA appears human-like in one dimension but not in another. Based on this sense-making process, expectancy violations can be viewed as negative or positive, which influences attitudes and communication processes. For example, users can perceive the ECA design as unnatural and therefore react negatively to the violation of their expectations. Finally, EVT posits that positive violations produce more favorable outcomes than positive confirmations of expectations, whereas negative violations produce more negative outcomes than negative confirmations. For example, Burgoon et al. (2016) found that positive expectancy violations (e.g., adding a text transcript to an ECA) had more favorable effects on task attractiveness than positive confirmations.

Several studies have investigated chronemic expectancy violations in technology-mediated interaction between humans (e.g., using email or instant messaging). These violations relate to a person’s expectations about when to anticipate receiving a response to a message they sent (Sheldon et al. 2006; Kalman and Rafaeli 2011). For example, Kalman and Rafaeli (2011) showed that email response time affects a person’s evaluation of the email sender. Additionally, Sheldon et al. (2006) demonstrated that delayed response time in online collaboration tasks influenced a person’s evaluation of the collaborator. Thus, similar to expectations about other forms of nonverbal behavior, people hold expectations about response times. As EVT suggests, these expectations are formed through experiences a person has had (Burgoon 2009). Therefore, users’ expectations about a chatbot’s response time may depend on their prior experience with other chatbots.

3 Hypotheses Development

Drawing on SRT and EVT, this study investigates how chatbot response time and users’ prior chatbot experience influence social presence perceptions and chatbot usage intentions. Two types of response time are examined, namely, instant and delayed. For prior chatbot experience, we distinguish between novice users (i.e., users who have not interacted with a chatbot before) and experienced users (i.e., users who have used chatbots before). Our research model shown in Fig. 1 captures our research hypotheses on the effect of response time on social presence (H1), the moderating role of prior chatbot experience (H2 & H4), and the mediating role of social presence in the relationship between response time and intention to use (H3).

Fig. 1
figure 1

Research model

3.1 Effect of Chatbot Response Time on Social Presence (H1)

According to SRT, humans tend to respond to technology similarly to how they do to other humans (Reeves and Nass 1996; Nass and Moon 2000). The fundamental assumption is that when technology is imbued with social cues (e.g., human-like appearance, natural language), users perceive the technology as a social actor and feel a sense of personal, sociable, and sensitive human contact during the interaction (Reeves and Nass 1996; Lee and Nass 2005). Based on this rationale, we hypothesize that during the interaction with a chatbot, a delayed response time more closely resembles how a human counterpart would respond and therefore serves as a social cue that triggers social responses in users. Consequently, a delayed response time might increase the feeling of a natural conversation and enhance users’ social presence perceptions. In contrast, an instant response time could introduce a “non-negligible feeling of artificiality” (Schuetzler 2015, p. 50) because human counterparts would need some time to read a message, make sense of it, and enter their response. Consistent with this reasoning, reports from practice suggest that users were “irritated that [a chatbot] replied to their questions unnaturally fast” (Crozier 2017), indicating that an instant response time could reduce social presence perceptions. Therefore, based on SRT, we propose that:

H1

A chatbot with a delayed response time yields higher social presence than a chatbot with an instant response time.

3.2 Moderating Effect of Prior Chatbot Experience on Social Presence (H2)

SRT posits that users tend to generalize expectations from human–human interaction to human–computer interaction, regardless of individual user characteristics (Nass et al. 1994; Nass and Moon 2000). These expectations are especially pronounced when users interact with chatbots since they “use cues of humanness to create a sense of social presence that is not present in traditional information systems such as websites, applications, and databases” (Grimes et al. 2021, p. 2). However, recent studies suggest that users also transfer expectations from their previous interactions with other chatbots to their current interaction with a chatbot (Gambino et al. 2020; Moussawi et al. 2020; Cambre et al. 2021). Therefore, users who have prior experience with chatbots are likely to have different expectations of a chatbot than users who have never interacted with one before.

Against this backdrop, we draw on EVT (Burgoon 1978, 1993) to hypothesize on the moderating role that users’ prior chatbot experience plays in how response time affects social presence perceptions. EVT not only offers a useful lens to theorize about expectancy violations in human–chatbot interaction (Grimes et al. 2021); it also provides deeper insight on violations of expectations that relate to response time – the so-called chronemic expectancy violations (Kalman and Rafaeli 2011). In fact, several studies have shown that users over time develop chronemic expectations about when to receive an anticipated response to a message they sent, and that they evaluate their counterpart based on these expectations (Sheldon et al. 2006; Kalman and Rafaeli 2011). Therefore, we propose that the effect of response time on social presence is contingent on users’ prior chatbot experience because users with no previous chatbot interaction have different expectations about a chatbot’s response time than users who have interacted with chatbots before.

SRT builds on the premise that users have pre-existing expectations regarding the interaction with another human in a given situation (e.g., chatting with friends via instant messaging applications such as WhatsApp, Telegram, or Facebook Messenger). These expectations have been developed over time as similar social situations humans encounter are stored in memory and activated when relevant situations arise (Gambino et al. 2020). Such situations include not only interactions with other humans but also interactions with technology (Nass and Moon 2000). Therefore, the same expectations – initially developed for human–human interaction – can extend to interactions with technology in general and chatbots in particular (Go and Sundar 2019). Here, importantly, recent studies suggest that this effect is especially pronounced in human–chatbot interaction because chatbots typically operate in the same environment – a simplistic chat window – that is frequently used to interact with other people (e.g., Araujo 2018; Beattie et al. 2020). In other words, given that human–chatbot interaction occurs in a chat window through natural language on a turn-by-turn basis (McTear 2017), users are more likely to draw on their experience from chatting with other people than from human–computer interactions with rich graphical user interfaces (e.g., websites, mobile apps) where the interaction occurs through clicking, scrolling, or swiping (Følstad and Brandtzæg 2017; Grimes et al. 2021).

Based on the above, we argue that in the absence of relevant experience, users who have not interacted with a chatbot before naturally draw on their experience of chatting with humans (who cannot respond instantly). Following this reasoning, a chatbot with an instant response time would violate novice users’ chronemic expectations. According to EVT, this causes an attentional shift toward the chatbot as the source of the violation. In an attempt to interpret this violation, novice users shift from an automatic to a more deliberate way of thinking about the true nature of their counterpart. This shift, however, may reduce their tendency to respond socially because it interferes with the automatic and unconscious process that would trigger social responses (Nass et al. 1994). Conversely, a delayed response time might not violate novice users’ chronemic expectations because it more closely resembles their previous interactions with humans. Therefore, an attentional shift is less likely and users’ tendency to respond socially to the chatbot is not negatively affected. Consequently, we argue that the positive impact of a delayed (as opposed to instant) response time on social presence is particularly strong for novice users.

In contrast, users who have used chatbots before might have formed different response time expectations, since chatbots in practice usually respond instantly (Lester et al. 2004; Schuetzler et al. 2021). Although it is possible that not all chatbots have instant response times, reports from practice suggest that optimizing the response time to deliver fast responses is an established chatbot design guideline (e.g., “Your chatbot needs to be fast; if it’s not it, won’t get used”; SysAid Technologies 2019). Therefore, experienced users’ expectations would not be violated by an instant response time. Conversely, given that experienced users likely have experienced instant responses before, a delayed chatbot response time could violate their expectations. As Burgoon et al. (2016) noted, an “unexpected delay in response […] may draw the human’s attention to the delay [who] will attempt to make sense of why the delay exists” (p. 7). In an attempt to make sense of the violation, experienced users might become more thoughtful and their tendency to (automatically and unconsciously) respond socially to the chatbot will be reduced. Consequently, experienced users’ social presence perceptions might be reduced when they interact with a chatbot that has a delayed response time. Hence, based on EVT, we propose that:

H2

Users’ prior chatbot experience moderates the effect of chatbot response time on social presence, such that the effect of a delayed (instant) response time is stronger for novice (experienced) users.

3.3 Mediating Effect of Social Presence on Intention to Use (H3)

As noted before, SRT posits that users treat technologies imbued with social cues as social actors rather than just tools to use. Therefore, a delayed response time, which more closely resembles human behavior than an instant response time, might serve as a social cue that enhances users’ social presence perceptions when they interact with a chatbot. SRT further proposes that users develop social relationships with technologies that are treated as social actors (Nass et al. 1996; Moon 2000; Fogg 2002). These relationships create an emotional bond between users and technologies (Nass et al. 1996) and thereby increase users’ intention to use them again in the future. Consequently, an increase in social presence might also lead to higher usage intentions toward a chatbot. In line with this reasoning, Xu et al. (2018) showed that users who have developed a social relationship with a recommendation agent exhibit higher usage intentions. Similarly, a number of studies have demonstrated that social presence is a key factor in driving usage intentions across various technology contexts (e.g., Cyr et al. 2007; Hassanein and Head 2007; Qiu and Benbasat 2009). Considering the relationships between chatbot response time, social presence, and intention to use together, we suggest that a chatbot with a delayed response time enhances users’ social presence perceptions, which in turn increases their usage intentions. Therefore, based on SRT and in accordance with previous studies, we propose that:

H3

Social presence mediates the effect of chatbot response time on intention to use.

3.4 Moderating Effect of Prior Chatbot Experience on Intention to Use (H4)

Typically, users who have never interacted with a chatbot before “expect chatbots to have human-like communication skills” (Janssen et al. 2020, p. 222). Unsurprisingly, chatbots are unlikely to meet these high expectations, which results in user frustration and reduced usage intentions (Brandtzaeg and Følstad 2018; Schuetzler et al. 2021). In contrast, experienced users “have grown accustomed to chatbots’ limitations” and therefore do not expect a human-like level of social presence (Jain et al. 2018, p. 895). Such users typically focus on other criteria, such as functionality and features, when deciding to use the chatbot (Liao et al. 2016; Brandtzaeg and Følstad 2017). Against this backdrop, we again draw on EVT to hypothesize that the impact of social presence on chatbot usage intentions is stronger for novice users than for experienced users. According to EVT, negative violations of expectations exert a stronger influence than negative confirmations of expectations (Burgoon 2015; Burgoon et al. 2016). Therefore, novice users’ high expectations (i.e., of human-like social presence levels) might be more strongly violated than experienced users’ expectations. Consequently, experienced users’ usage intentions are less likely to be influenced by their social presence perceptions because they would not expect a chatbot to create a strong sense of social presence in the first place. Based on these considerations, we propose that:

H4

Users’ prior chatbot experience moderates the effect of social presence on intention to use the chatbot, such that the effect is stronger for novice users than for experienced users.

4 Method

4.1 Experimental Conditions

To test our hypotheses, we conducted a between-subjects lab experiment in which 202 participants interacted with a chatbot that responded either instantly (INST) or with a delay (DLY). Participants were randomly assigned to one of these conditions. In the instant response time condition (INST), the chatbot responded instantly without any delay (i.e., as fast as technically possible). Since sending a message involved a short network delay caused by physical limits of data transmission over the Internet, this corresponded to a response time of about 200 to 400 ms, which is comparable to chatbots with instant response times used in practice. A total of 67 participants (33.2%) interacted with the chatbot that had an instant response time. In the delayed response time condition (DLY), the chatbot responded with a delay of 2.3 s on average. To ensure that the results do not rest on the specific type of delay, we considered both delay approaches described in the literature (i.e., static and dynamic delays). Therefore, this condition had twice as many participants as the INST condition (i.e., 135 participants, 66.8%). Our analysis confirmed that there was no difference in user perceptions between static and dynamic delays (see Appendix A, available online via https://doi.org/10.1007/s12599-022-00755-x ). Hence, our analysis does not differentiate the two types of delay.

4.2 Experimental Procedure and Task

The experiment took place at the Karlsruhe Decision and Design Lab (KD2Lab), adhering to its procedural and ethical guidelines. First, upon arrival at the lab, participants read and signed an informed consent form. Subsequently, they sat down at a computer and received instructions on the experiment. These instructions introduced participants to the hypothetical scenario of using a chatbot to find out whether they could save money by changing their mobile phone plan. Additionally, we provided a fictitious copy of last month’s mobile phone bill, which indicated that the current mobile phone plan did not fit participants’ actual usage patterns (e.g., their data usage was much higher than the volume included in their plan, resulting in high additional costs) (see Figure A1 in Online Appendix A). Additionally, the instructions clarified that participants were about to interact with a chatbot and not with another human being. Subsequently, participants interacted with a chatbot in one of the experimental conditions. After identifying a more suitable mobile phone plan based on the chatbot’s recommendation and ending the conversation, participants completed a post-experiment questionnaire that captured the dependent variables social presence and intention to use, prior experience with chatbots, the manipulation check (i.e., perceived response time), and several controls. After completing the questionnaire, we debriefed participants and compensated them with €7 for their participation.

4.3 Experimental Chatbot

For the experiment, we developed different versions of a chatbot using the Microsoft Bot Framework (Microsoft 2021). The chatbots were able to advise participants on different mobile phone plans and help them save money by recommending plans that better fit their actual usage patterns. As illustrated in Fig. 2, the chatbots asked participants several questions about their current plan and usage patterns (e.g., data volume used last month), but they were also able to answer other questions related to mobile phone plans. Participants interacted with the chatbots by formulating and sending their own messages to increase the realism of the chatbot interaction. To process natural language user input (e.g., to recognize user intentions and extract entities in a user’s message, such as the names of different mobile phone plans), the chatbots used Microsoft’s Language Understanding Intelligent Services (LUIS). We trained all chatbots on the same language model and implemented identical dialogs. The only difference between the chatbots was their response time according to the experimental conditions.

Fig. 2
figure 2

Excerpts from an exemplary chatbot conversation during the experiment

4.4 Participants

We recruited participants from a European university student pool. Given the rather low chatbot adoption in the general population (SmartAction 2018; Inmar 2019), we considered students to be appropriate for this study because they are among the early chatbot adopters (Brandtzaeg and Følstad 2017; Tuzovic and Paluch 2018). An a priori power analysis using G*Power (Faul et al. 2007) with a significance level of 0.05 determined a minimum sample size of 207 participants to achieve a statistical power of 0.90 for detecting a medium effect size (f = 0.25). As we anticipated that some participants might encounter (technical) difficulties in their interaction with the chatbot, we aimed for a sample size of about 220 participants.

In total, 219 subjects participated in the study. We excluded 13 participants because they provided incorrect answers to the attention check questions in the questionnaire (e.g., “If you are carefully filling out the survey, please select ‘Strongly agree’”). Additionally, we screened the protocols of each conversation as prior chatbot studies did (e.g., Go and Sundar 2019) to filter out six participants who did not follow the instructions (e.g., ended the conversation before the chatbot had recommended a plan) and/or purposefully provided invalid inputs during the conversation with the chatbot (e.g., entered negative or unrealistically high values when asked about how much they would be willing to pay for a new mobile phone plan). Therefore, our final sample included 202 participants (77 female, 125 male). Participants were between 18 and 41 years old (M = 23.21, SD = 3.45). Online Appendix B shows the participants’ demographic and personal characteristics.

4.5 Measures

We used previously validated measures for all constructs and slightly adapted them to the context of this study. We assessed social presence using the items from Gefen and Straub (1997) and intention to use using the items from Wang and Benbasat (2009) on seven-point Likert scales (1 = “strongly disagree”; 7 = “strongly agree”). Similar to previous studies (e.g., Ashktorab et al. 2019; Xu 2019; Moussawi et al. 2020), we measured prior chatbot experience and distinguished between novice and experienced users by asking participants whether and how often they use chatbots (five-point Likert scale; 1 = “never”; 5 = “daily”). This resulted in a roughly even split between novice users (i.e., users who have not interacted with a chatbot before) and experienced users (i.e., users who have used chatbots before). Additionally, we collected demographic information from participants (i.e., age, gender) and assessed their level of sociability (Bruch et al. 1989) as potentially relevant control variables. Online Appendix B lists all measurement items.

Given that these items were measured with the same method, we tested for common method variance using Harman’s single factor test (Podsakoff et al. 2003). This test showed that no single major factor emerges and that the first factor accounts for only 33.0% of the total variance. Moreover, our main independent variable – chatbot response time – represents an experimental manipulation and is therefore not measured via self-report. Thus, common method bias is not a serious concern in this study.

5 Results

5.1 Manipulation Check

To check our manipulation of the chatbots’ response time, each participant rated the response time using a seven-point Likert scale (1 = “slow”; 7 = “fast”) (Galletta et al. 2006). The results of a t-test showed that participants in the INST condition (M = 6.67, SD = 0.70) perceived the chatbot as significantly faster than participants in the DLY condition (M = 5.81, SD = 1.15; t(200) = 5.61, p < 0.001). Therefore, we conclude that our manipulation was successful in influencing participants’ perception of the chatbots’ response time as either instant or delayed.

To confirm the successful randomized assignment of participants to our experimental conditions, we conducted several t-tests and chi-square difference tests. There were no significant differences in prior chatbot experience (χ2(1) = 0.087, p = 0.768), age (t(200) =  − 0.468, p = 0.640), gender (χ2(1) = 0.393, p = 0.530), sociability (t(200) = 0.778, p = 0.437), education (χ2(3) = 4.45, p = 0.216), prior messenger experience (t(200) =  − 1.80, p = 0.073), and chat duration (t(200) = 0.380, p = 0.704) between the INST and DLY conditions.

5.2 Measurement Model Assessment

We assessed the measurement model by examining indicator reliability, internal consistency, convergent validity, and discriminant validity of all latent constructs (i.e., social presence, intention to use, and sociability). After dropping three items of the sociability control variable due to low factor loadings, all remaining items loaded significantly onto their intended factor with loadings ranging from 0.70 to 0.95, thus supporting indicator reliability (Gefen and Straub 2005) (see Online Appendix B). Next, to demonstrate the constructs’ internal consistency, we calculated the composite reliability (CR) and Cronbach’s alpha (CA) for each construct. As shown in Table 1, all constructs exceeded the recommended threshold of 0.70 for both CR and CA (Nunnally and Bernstein 1994). Also, all average variance extracted (AVE) values exceeded the suggested threshold of 0.50 (Fornell and Larcker 1981). Finally, the square root of the AVE of each construct was higher than the correlations with other constructs, thus supporting discriminant validity (Fornell and Larcker 1981). Taken together, these results support the reliability and validity of the measurement model.

Table 1 Internal consistency and discriminant validity of constructs

5.3 Descriptive Results

Table 2 provides descriptive statistics with means and standard deviation for all constructs in the instant and delayed response time conditions.

Table 2 Descriptive statistics

5.4 Hypotheses Testing

To test our hypotheses, we used Hayes’ PROCESS macro for SPSS (Hayes 2018). Since our research model involved first and second stage moderated mediation (Edwards and Lambert 2007), we conducted a moderated mediation analysis with 5,000 bootstrap samples using Model 58 (Hayes 2018). In this analysis, chatbot response time served as the independent variable (0 = instant, 1 = delayed), prior chatbot experience (0 = novice, 1 = experienced) as the moderator, social presence as the mediator, and intention to use as the dependent variable (see Fig. 3). Finally, we included age, gender, and sociability as covariates in the analysis. Table 3 shows the analysis results in more detail.

Fig. 3
figure 3

Moderated mediation model with path coefficients

Table 3 Estimates of direct effects and interactions from the moderated mediation model

5.4.1 Social Presence (H1 & H2)

The results show that while a delayed rather than an instant response time is associated with higher social presence (H1 supported: b = 0.69, p < 0.05), this relationship is moderated by prior chatbot experience (H2 supported: b =  − 1.20, p < 0.01). Specifically, the conditional effects reveal that delayed response time has a positive effect on social presence for novice users (b = 0.69, p < 0.05, 95% CI [0.14, 1.23]), whereas the effect is negative for experienced users (b =  − 0.51, p < 0.05, 95% CI [− 1.02, − 0.01]). Hence, it is important to note that although H1 is supported, this positive effect of a delayed (vs. instant) response time on social presence holds only for novice users, but not for experienced users. To clarify the nature of this interaction, we performed a spotlight analysis (Spiller et al. 2013). As depicted in Fig. 4, in the INST condition, social presence is significantly higher for experienced users than for novice users (b = 1.13, t = 3.64, p < 0.001), whereas in the DLY condition, the difference in social presence between experienced and novice users is not significant (b =  − 0.06, t =  − 0.28, p = 0.78). This shows that the opposing effects of a delayed response time on social presence are driven by the differences in how novice and experienced users perceive a chatbot that responds instantly.

Fig. 4
figure 4

Interaction effect between chatbot response time and prior chatbot experience on social presence (H2)

5.4.2 Intention to Use (H3 & H4)

Regarding the right-hand side of the model, the analysis shows that social presence positively influences intention to use (b = 0.66, p < 0.001) subject to the user’s prior chatbot experience (H4 supported: b =  − 0.36, p < 0.05). Specifically, the conditional effects reveal a positive effect of social presence on intention to use for both novice users (b = 0.66, p < 0.001, 95% CI [0.41, 0.91]) and experienced users (b = 0.30, p < 0.01, 95% CI [0.08, 0.53]). Still, the effect of social presence on intention to use is twice as large for novice users than it is for experienced users. Providing support for the mediating role of social presence, the indirect effect of chatbot response time on intention to use is significant for both novice users (b = 0.45, SE = 0.19, 95% CI [0.12, 0.86]) and experienced users (b =  − 0.15, SE = 0.10, 95% CI [− 0.37, − 0.002]), while the direct effect is not (b =  − 0.12, SE = 0.23, 95% CI [− 0.57, 0.34]) (see Table 4). This finding supports H3. In addition, the index of moderated mediation, which tests whether the indirect effect varies across levels of the moderator (Hayes 2015), is significant (index =  − 0.61, SE = 0.21, 95% CI [− 1.04, − 0.22]). This result indicates that the indirect effects of chatbot response time on intention to use via social presence significantly differ for novice vs. experienced users. More specifically, a delayed response time has a positive effect on novice users’ intention to use (via social presence), whereas the mediated effect is negative for experienced users.

Table 4 Direct effect and conditional indirect effects of chatbot response time on intention to use

To understand the moderating role of chatbot experience (H4) in greater detail, we conducted a floodlight analysis using the Johnson–Neyman technique (Spiller et al. 2013; Finsaas and Goldstein 2021) to test at which social presence levels the impact on intention to use is different for novice compared to experienced users. We find that the difference between novice and experienced users is significant for social presence values between 1 and 2.74 (see Fig. 5). This result suggests that the usage intentions of novice users who perceive a low social presence are more strongly affected than the usage intentions of experienced users who perceive a similarly low social presence. In other words, novice users are less likely to use a chatbot when social presence is low (≤ 2.74), while low social presence seems less critical for experienced users’ intention to use a chatbot.

Fig. 5
figure 5

Interaction effect between social presence and prior chatbot experience on intention to use (H4) Note. Shaded area represents levels of social presence, where the difference between novice and experienced users is significant at the .05 level

6 Discussion

In this study, we investigated how prior chatbot experience might explain the effect that chatbot response time has on users’ social presence perceptions and chatbot usage intentions. Our results reveal opposing effects of a delayed response time and shed light on the differences between novice and experienced chatbot users. First, in line with our expectations, a delayed (as opposed to instant) response time enhances novice users’ social presence perceptions. However, we find the opposite effect for experienced users: a delayed response time actually reduces their social presence perceptions. This finding not only highlights the important role of users’ prior chatbot experience, but also suggests that under certain circumstances, incorporating artificial delays in a chatbot can backfire. Experienced users may find a delayed response time irritating or annoying because they are aware that a chatbot can answer instantly, which could lead to social presence perceptions being lower than they would otherwise have been.

Second, our results show that social presence mediates the effect of chatbot response time on usage intentions, and that this mediation is moderated by prior chatbot experience. Corroborating our previous findings, the indirect effect of a delayed response time on chatbot usage intentions via social presence is positive for novice users but negative for experienced users. This finding suggests that a delayed response time not only has opposing effects on users’ immediate social presence perceptions, but also leads to markedly different downstream consequences on usage intentions. Although a longitudinal study would be necessary to fully understand how the relationship between social cues, such as response time, and usage intentions evolves over time, there is reason to believe that the positive effect of social cues in chatbot design could disappear or even become negative as users gain experience through ongoing interactions with chatbots.

Third, our results reveal how the impact of social presence on usage intentions differs depending on users’ prior chatbot experience. In general, the impact is stronger for novice than for experienced users, indicating that with increased experience, the importance of social presence as a determinant of chatbot usage intentions could decrease. Interestingly, we also find that when social presence perceptions are low, experienced users have higher usage intentions than novice users. This finding suggests that experienced users might be more tolerant of lower social presence perceptions than novice users when forming their intention to use a chatbot. Additionally, it could indicate that experienced users prefer less “humanized” chatbot designs because these users tend to focus on the utility of a chatbot rather than its ability to provide a human touch (Brandtzaeg and Følstad 2017). In the following section, we discuss the implications of our findings for theory and practice, highlight potential limitations, and outline directions for future research.

6.1 Theoretical Implications

Our findings contribute to theory in three ways. First, we extend SRT by identifying prior experience as a key moderating factor that shapes users’ social responses to chatbots. This finding challenges a core assumption of SRT, namely that social responses occur automatically and unconsciously without being “confined to a certain category of people” (Nass and Moon 2000, p. 98). According to this view, chatbot response times that more closely resemble human behavior should trigger social responses in all users, regardless of their individual characteristics. However, our finding that for chatbot response time this effect only holds for novice users and not for experienced users suggests that prior experience presents an important theoretical boundary condition that affects whether and how users respond socially to a chatbot, pointing to a more complex mechanism than previously thought. More broadly, it highlights the need for extending SRT to consider that individual differences between users in terms of prior experience with a particular technology can affect how they respond to specific design features (here, response time). Through integrating ideas from EVT (Burgoon 1978, 1993), our research can serve as a starting point to guide the extension of SRT by considering the expectations users bring to the interaction and how violating them can ultimately affect users’ social responses to technology.

Second, we offer an explanation for earlier inconsistent findings regarding the role of response time in the context of chatbots (Holtgraves et al. 2007; Appel et al. 2012; Gnewuch et al. 2018; Schanke et al. 2021), as well as websites and mobile apps (Galletta et al. 2004; Buell and Norton 2011; Yu et al. 2020; Tsekouras et al. 2022) by introducing users’ prior experience as an important contingency factor. Specifically, our findings help reconcile inconsistencies in the literature by clarifying the conditions under which instant or delayed chatbot responses result in positive outcomes (i.e., increased social presence, higher usage intentions), namely when its users are experienced users or novices. Tsekouras et al. (2022) made a similar observation in finding that recommendation agent users who were more familiar with a given product context (e.g., searching for cars on the Internet) reacted less positively to a delay in product recommendations being generated compared to users with low familiarity. Collectively, these findings suggest that investigations into the effects of response time should take users’ prior experience with the technology itself or the context in which it is used into consideration.

Finally, existing research in IS and related disciplines has mostly focused on verbal and visual cues, such as human names or human-like avatars, and how they trigger social responses in users (e.g., Hess et al. 2009; Qiu and Benbasat 2009; Araujo 2018). Our study extends this research stream by examining response time – an important cue that falls into the category chronemic (i.e., time-related) cues which has received very little attention in the literature (Feine et al. 2019). Although chronemic cues are often overlooked due to their invisibility (Littlejohn and Foss 2009), our findings on the opposing effects of a delayed response time show that they can strongly impact users’ perceptions of a chatbot. Further, related studies on voice assistants (Porcheron et al. 2018) and physical robots (Shiwa et al. 2009) suggest that the importance of response time as a social cue is not limited to the chatbot context. Consequently, response time should be recognized as an important design feature that researchers and practitioners can deliberately design and control when implementing chatbots and other types of conversational agents.

6.2 Practical Implications

Our findings also have important practical implications. Currently, most organizations and individuals who design, develop, or implement chatbots and other types of conversational agents (e.g., voice assistants like Alexa) follow a “one-design-fits-all” approach by focusing on one design that meets the expectations of most potential users at that point in time. However, our findings suggest that a single design sooner or later will run into difficulties in meeting expectations because users have different expectations that can evolve as they gain experience. Therefore, a “one-design-fits-all” approach could be one reason why companies struggle to increase the adoption and use of their chatbot. For example, novice users might prefer a more “humanized” design (e.g., delayed responses), while experienced users could be irritated by a chatbot that aims to closely mimic human conversation; instead, they might prefer a more “machine-like” design (e.g., instant responses). Therefore, a key practical implication of our study is that chatbots need to be personalized based on the characteristics and preferences of their users. Although a discussion of personalization strategies for chatbots is beyond the scope of this paper, we recommend that practitioners explore how users could manually personalize a chatbot according to their preferences (e.g., allowing users to enable or disable certain design features) and how to develop chatbots that are capable of automatically adapting themselves to a user (e.g., based on available user data or previous interactions).

Second, our study highlights the need for sensitivity to seemingly minor design features and their impact on users’ perceptions of chatbots. Designers and organizations often put effort into giving their chatbot a “personality” with a human name (i.e., a verbal cue) and a human-like avatar (i.e., a visual cue) (Araujo 2018), yet chronemic cues that relate to temporal aspects of the interaction are easily overlooked. Our findings suggest that attention to such detail is important, particularly because chronemic cues, such as response time, are inherent to natural language interaction and therefore central to any technology equipped with a conversational user interface. Consequently, practitioners would be well advised to add language experts or psychologists to their development teams, thereby drawing on their knowledge of human conversation in designing chatbot conversations.

6.3 Limitations and Future Research

There are limitations to this study, which could open up avenues for future research. The first limitation relates to the operationalization and measurement of prior experience as a dichotomous variable (i.e., novice vs. experienced), which does not take different levels of experience with chatbots into account. However, we believe that the distinction between novice users and experienced users is appropriate for two main reasons. First, understanding the difference between novice and experienced users, rather than the difference between users with different levels of experience, is particularly important in the context of adoption and use (Thompson et al. 1994; Taylor and Todd 1995; Galletta and Dunn 2014). Moreover, recent chatbot studies specifically highlight the difficulties novice users face when making the “transition” to becoming experienced users (e.g., Jain et al. 2018; Muresan and Pohl 2019). Second, due to the low adoption of chatbots in the general population, a large part of our sample was novice users (46%) (see Online Appendix B). In situations with such an uneven distribution across categories, a reasonable strategy is to collapse a categorical variable by combining several of its categories (Babbie et al. 2018). Nevertheless, we acknowledge that future studies need to expand on our findings by conducting a finer examination along the continuum of chatbot experience in order to assess how different levels of experience influence users’ social responses to chatbots. Another promising avenue for future research could be to examine the impact of other individual user characteristics (e.g., age, personality, digital savviness) that may also lead to differences in their social responses.

A second limitation is the use of a student sample in our experiment, which might not be representative of the entire population (e.g., because students have higher levels of technical knowledge). Therefore, our findings might not be generalizable to the wider population. Nevertheless, we believe that using a student sample here was adequate for several reasons. First, students are among the early adopters of chatbots (Brandtzaeg and Følstad 2017), whereas a major part of the general population has not interacted with a chatbot before (Brandtzaeg and Følstad 2017; Inmar 2019). Therefore, a student population allowed us to recruit novice and experienced chatbot users to participate in the experiment. Second, organizations often implement chatbots to explicitly target the younger, tech-savvy generation (e.g., students) (Xiao and Kumar 2021). Therefore, insight into this group’s perceptions and intentions might be particularly relevant to IS research and practice. Third, a homogeneous group of participants generally helps to maximize internal validity in order to clearly identify the treatment effects without additionally controlling for many other factors (Lynch Jr. 1982; Price et al. 2015). However, future research could further cross-validate our findings with user groups from different population segments.

A third limitation lies in our study design that did not allow us to capture participants’ initial expectations about chatbots before the experiment. In experimental research, it is crucial that participants do not know the experiment’s true purpose (e.g., treatment, group assignment) because this knowledge would influence their behavior and present a threat to internal validity (Orne 1962). Therefore, having asked participants about how quickly – based on their prior experience – they expected to receive an answer from a chatbot would have drawn their attention to this design feature during the experiment, which would have seriously biased participants’ expectations and responses. Consequently, we were unable to capture participants’ actual expectations in detail. For example, it is possible that experienced users had previous interactions with chatbots that did not respond instantly to their messages, even if reports from practice suggest that achieving a fast response time is an established chatbot design guideline (e.g., “Your chatbot needs to be fast; if it’s not, it won’t get used”; SysAid Technologies 2019). Although the approach of “assume[ing] that the expectations exist in the participant population and explore[ing] the consequences of violating the expectancies” is common in EVT studies (Kalman and Rafaeli 2011, p. 57), future research is needed to explicitly measure and analyze users’ expectations about chatbots in more detail. This would be particularly interesting for experienced users whose previous interactions with chatbots might have been different, and therefore led to different expectations. Another promising research direction could be to analyze users’ expectations about other types of social cues (e.g., verbal or visual cues) and the impact of violating expectations surrounding these cues.

Fourth, research suggests that there may be an inverted U-shaped relationship between response time and social presence, such that both very short and very long response times lead to negative user responses (Moon 1999; Tsekouras et al. 2022). However, since the delayed response time of 2.3 s on average in our experiment was still rather short, we could not examine how users evaluate an overly long response time of a chatbot. Therefore, future research should expand on our findings by investigating longer response delays and the potential inverted U-shaped effect of response time.

Fifth, a valuable extension of our research would be to explore users’ assumptions about the delayed response time and how they make sense of the chatbot’s delayed responses. Some users might attribute the response delay to the complexities of AI and natural language processing, which might need a certain amount of time to process a message sent to the chatbot. Other users might think that the delay is caused by a slow network or hardware inefficiencies (e.g., poor internet connection, slow processing speed). Still others might believe that the delay was deliberately introduced to fool them into believing that they are interacting with a real person. Since our study did not investigate how users perceived and made sense of the delay, an in-depth analysis of the users’ thinking process could be a promising avenue for future research. Similarly, in terms of system design, it would be interesting to investigate how these perceptions change if users are given the option to turn the delays on or off.

Finally, interactions between participants and chatbots in our experiment were rather short (i.e., five minutes on average) and did not allow us to track how perceptions and expectations evolve as users gain experience. Therefore, longitudinal studies are required to examine how perceptions and expectations evolve or change when people use a chatbot over a longer period of time.

7 Conclusion

Recent years have seen a rapid increase in the number of organizations that implement chatbots to automate customer service. Most, if not all, of the chatbots are intentionally designed to look and act like humans. Our findings, however, challenge the assumption that chatbots should always mimic human appearance and behavior. Taking users’ prior experience into account, we find that a delayed response time of a chatbot – which more closely resembles human behavior than an instant response time – can have negative effects. While a delayed response time positively influences novice users’ social presence perceptions and chatbot usage intentions, the effect is negative for experienced users. These findings not only highlight the important role of individual user characteristics in human–chatbot interaction, but could also help explain some of the high-profile failures of human-like chatbots (e.g., IKEA Anna) that could not sustain user engagement beyond an initial interest (Brandtzaeg and Følstad 2018). Therefore, a major implication of our study is that the current “one-design-fits-all” approach to chatbot design could be one reason for the ongoing struggle to meet users’ expectations and increase adoption.