Abstract

Most chatbot interfaces in contemporary m-commerce platforms feature a single chatbot that provides recommendations for all product categories. Nonetheless, there is an emerging research interest in multi-chatbot systems designating multiple chatbots as product/domain-specific advisers. Given the dearth of studies investigating the effects of multi-chatbot versus single-chatbot in the m-commerce context, we addressed this research gap by conducting an online between-subjects experiment to explore how the m-commerce chatbot interface types can differently influence source credibility, social presence, trusting beliefs, and purchase intention. Based on 154 valid responses, the single-chatbot interface led to higher social presence and trusting beliefs toward the m-commerce platform than the multi-chatbot interface. Males attributed the chatbot with higher competence and reported higher purchase intention through the m-commerce platform when engaging with the single-chatbot interface than the multi-chatbot interface. These findings suggest that designating chatbots as product-specific advisers in a multi-chatbot interface without labels to accentuate expertise could not evoke the users to categorize them as product specialists. Moreover, the multi-chatbot interface could have imposed user confusion and unfamiliarity cues, decreasing trust in the m-commerce platform. These findings’ theoretical, design, and managerial implications are discussed through the lens of the computers-are-social-actors paradigm, source credibility theory, source specialization, multiple source effect, and m-commerce behavioral research.

1. Introduction

Chatbots are computer programs that utilize text-based dialogue systems to simulate conversational engagement with humans [13]. Today, chatbots are popularly used in mobile platforms, fueled by the advances in artificial intelligence (AI) and natural language processing technology [4] and the significant business trend toward conversational commerce [58]. Chatbots have been deployed to perform various functions, including taking product orders from customers, providing answers to frequently asked questions, and dispensing product recommendations across m-commerce domains such as food (Pizza Hut), apparel (Zalora), travels (Malaysia Airlines), banking services (Bank of America), and multi-categorical products (eBay). By integrating chatbots within readily available chat interfaces familiar to users such as Facebook Messenger, Slack, WhatsApp, and WeChat [3], companies can provide automated personas that are ever ready to receive queries and dispense information anywhere anytime [8]. Currently, some of the popular product recommender chatbot agents implemented in mobile messaging applications are eBay ShopBot, Chatbot Sephora, Chatbot Castorama, and Chatbot H&M. These chatbots simulate the role of a product advisor by first asking a series of questions to users about their preferences, followed by offering personalized product recommendations based on users’ responses.

Despite the increasing popularity of chatbot technology in m-commerce platforms, reports indicated that chatbots were not always well received by users. For instance, 80% of consumers believed that the after-sales services provided by chatbots were inferior to sales services offered by humans [9]. The same report also revealed that while 90% of the consumers thought businesses were ready to implement chatbots, 54% indicated that they would prefer to talk to human agents rather than chatbots. This issue highlights the importance of social and behavioral aspects of chatbot design from human-computer interaction (HCI) perspective [1, 10, 11]. As intention to accept and use a chatbot’s recommendations depends on the user’s social perceptions of the chatbots’ credibility, social presence, and trust in the online platforms, it is thus crucial for chatbots to be designed to accentuate the qualities above [1, 2, 6, 11]. Drawing on the the computers-are-social-actors paradigm which states that human-to-human social rules apply to interaction between users and technological artifacts, chatbot designs should be considered carefully to create social cues evoking appropriate users’ social and behavioral reactions during the engagement.

In general, contemporary multi-product category m-commerce platforms feature a single chatbot persona that provides recommendations for all product categories, such as eBay’s Shopbot, Aerie, Spring Bot, and H&M bot. However, recent studies have considered the development of multi-agents and multi-chatbot platforms. For instance, three embodied virtual agents were dedicated to advising on each of the three product category types in multi-product categorical e-commerce websites [12, 13]. Researchers [14] devised three unique chatbots to advise on specific investment fields —PoupancaGuru as savings expert, TDGuru as treasury bonds expert, and CBDGuru as government bonds expert within a financial advisory platform known as Finch. Accordingly, users can direct their questions to the different chatbots to match the financial investment domain queries with the relevant domain-specific bots. A study [15] operationalized domain-specific chatbots conveying tourism information. The chatbots were assigned to specific travel domains, including nature (the coast, beaches, national parks, etc.); culture (architecture, museums, theaters, and history); and shopping (handmade crafts, malls, and souvenirs). Another research [16] reported the proposed implementation of six distinct embodied virtual agents in a virtual coaching system in the Council of Coaches project. The six embodied virtual agents would take on the specific role of an expert coach for each of the respective health domains: social, cognitive, physical, mental, diabetes, and chronic pain.

Notwithstanding the increased attention toward multi-chatbot systems in the emergent literature, most studies have focused on technical and system factors for establishing multi-chatbot platforms while under-exploring the social, affective, and behavioral effects of interacting with multi-chatbot systems [1721]. This study, therefore, extends prior works on the multi-chatbot interface to m-commerce, which is one of the domains that utilize chatbot technology extensively. To the best of our knowledge, no studies have yet to examine the social and behavioral effects of a multi-chatbot interface designating chatbots as product-specific advisors in an m-commerce platform. Hence, this study aims to fill this research gap by addressing the following questions.

RQ1: To what extent do a multi-chatbot and single-chatbot m-commerce interface differently affect perceived chatbot competence[a], trust toward the m-commerce platform (ability, benevolence, and integrity) [b], intention to purchase through the m-commerce platform [c], and perceived social presence [d]?

RQ2: To what extent does a user’s gender moderate the effects of m-commerce chatbot interface (multi-chatbot and single-chatbot) on perceived chatbot competence [a], trust toward the m-commerce platform (ability, benevolence, and integrity) [b], intention to purchase through the m-commerce platform [c], and perceived social presence [d]?

2. Literature Review

2.1. Theoretical Perspectives on Designating Chatbot as Product-Specific Advisers in a Multi-Chatbot Interface

According to the computers-are-social-actors paradigm, people treat technological artifacts using the same social rules derived from human-to-human interaction [22]. It has been postulated that people’s engagements with technological entities are mindless and automatic, as they rely upon overlearned social scripts sourced from their routine social experiences with people [23]. On this basis, technological entities can be devised to hold social cues activating a wide array of learned social scripts in the mind of users engaging with these entities, which influences users’ perceptions and behavioral intentions toward the systems [24]. Among the different social cues, this study focuses on specialty cues [25, 26].

Specialty cues underpin this research as framing chatbots as product/domain-specific advisers can convey social cues which evoke users to categorize the agents as product or domain specialists. Specialty cues can elicit a heuristical assumption that social objects performing tasks in niche domains are specialists and, therefore, hold higher domain expertise and knowledge than generalists performing tasks across diverse domains [2729]. For instance, the social descriptors denoting a “brain surgeon” and a “professor of Korean history” would cause people to perceive these social models as specialists within their niche domains [27]. This study draws on the single functionality as a specialty cue hypothesis, which posits that assigning information sources to single functionality can act as a cue to trigger specialization schema [28].

A study [28] tested the single functionality as a specialty cue hypothesis by posting notes on three IoT devices to designate each to uniquely convey weather, traffic, or event information, respectively. The IoT devices assigned to unique functions evoked stronger social presence, higher perceived expertise, and more positive attitudes toward the devices than the IoT devices that shared the same function of dispensing information on weather, traffic, or event. Other studies found similar effects with embodied virtual agents in multi-product category websites; specifically, users ascribed higher perceived expertise toward the agents assigned to specific product categories than the single generalist agent dispensing recommendations for all product categories [12, 13]. However, it is worth noting that the researchers infused the product-specific agents with labels accentuating product specialization, for instance, “Hi, I am Anna — your camera product specialist,” which might have helped prompt the users to categorize the agents as product specialists.

Conversely, the single functionality as a specialty cue effect was not found in other studies. In an automated financial advisory system known as Finch, researchers assigned three unique chatbots as domain-specific experts in financial investment, including PoupancaGuru as savings expert, TDGuru as treasury bonds expert, and CBDGuru as government bonds expert [14]. In contrast, the generalist chatbot, InvestmentGuru, was tasked to advise all investment domains. However, the users did not rate the m-commerce platform’s competence, trust, and effectiveness differently between the domain-specific chatbot and single-chatbot design interfaces.

Through a Wizard of Oz experiment, researchers [15] found no differences in terms of conversation content, user’s speech, and impression toward the system when comparing between the tourism-related multi-bot system that featured domain-specific chatbots (Nature Agent, Culture Agent, and Shopping Agent) and the single-bot platform that had only one generalist agent (Natal Agent). Additionally, users reported more confusion in multi-chatbot engagements and utilized specific strategies to organize turn-taking when conversing with the chatbots.

The confusion within the user-agent interaction corroborates the argument made in the study [30] that users potentially prefer to engage with a consistent source rather than many diverse sources, as the former produces less cognitive load than the latter. Pertinently, the study did not find any main social effects between the multi-agent interface featuring different voice agents for each platform (smartphone, personal cloud computing, smart TV) and the single-agent interface incorporating one voice agent embodying these devices. Nonetheless, the user’s gender moderated the effects — females preferred the single consistent agent, whereas males preferred diverse voice agents.

Researchers [31] endowed one unique voice agency for each of the three different devices (TV, speaker, lamp) and found that the multi-agent interface did not produce main effects for social attraction and trust than the single-agent interface with one voice embodying all the devices. However, the user’s personality was a moderating factor insofar as extroverts reported higher social attraction and trust toward the media technology when interacting with a single-agent over the multi-agent frame. Whereas, introverts had stronger social attraction and trust toward the media technology with multiple-agents over the single-agent frame. The researchers argued that extroverts preferred interacting with the “more talkative” agent affording a longer conversation duration from the single voice embodying all the devices compared with the multi-agent interface. Introverts might have preferred interacting with the multi-agent interface because interactions with each agent were briefer due to the dispersed agencies across the different devices compared to the single-agent interface.

2.2. Source Credibility Theory and Trust toward M-Commerce Recommender Platform

The source credibility theory attempts to explain how dimensions of an information source can influence users’ acceptance and use of the message conveyed by the source [32]. The dimensions of source credibility can be distinguished as trustworthiness and expertise [33, 34]. Trustworthiness refers to the extent to which the message source is perceived to be objective and honest; whereas, expertise refers to the degree to which the message source is perceived to hold high competence, skills, and knowledge required to dispense quality information [32, 35, 36]. This study draws on the literature surrounding source credibility and the social design of recommender systems, given that the chatbots are product recommenders in this study. Reseachers [37] have asserted that technological sources conveying recommendations (recommender platforms) are social actors; therefore, users judge the source credibility of the systems through simple inspections based on exhibited social cues. As people are more likely to accept recommendations from credible sources, the credibility of recommender platforms is crucial in increasing the likelihood of message acceptance [38, 39]. Researchers [37] distinguished the credibility dimensions of recommender platforms to expertise and trustworthiness. Perceived expertise of a technological source refers to the extent to which the recommender system is perceived to have the ability, competence, and domain knowledge to perform its task effectively [40, 41]. On the other hand, the trustworthiness of a technological source is defined as 1) fair and unbiased and 2) motivated to prioritize the user’s interest over own gain.

Scholars [41] have posited that the source credibility or reputation of a website/vendor could consequently influence the user’s trust toward the recommender agent through a process known as trust transference. That is, buyers rely on cues associated with trusted “proof source” when dealing with unfamiliar sellers [42]. The transference process can also manifest in reverse insofar as the source credibility of agents representing the website/vendor (chatbot or virtual agent) can affect users’ trusting beliefs toward the platform that affords the agents [11, 13, 43]. Thus, the source credibility of agents translates into trusting beliefs toward the recommendation platform — trust ability, trust benevolence, and trust integrity [41, 4446]. Trust ability refers to the user’s willingness to accept recommendations based on the perceived competence or expertise of the recommender platform. Trust benevolence is associated with the user’s perception that the recommender system cares and acts in the user’s interest rather than prioritizing profit on behalf of vendors. In contrast, trust integrity refers to the extent to which a user perceives the recommender system as truthful and ethical in providing information and fulfilling its promises [41]. Given that recommender systems are treated as social actors, such conceptualization of trust factors is crucial insofar that even when users find these sources to be competent, they may not fully trust these sources for they may believe that these sources hold their motivation and goals, which are prioritized over user’s benefits [14, 45].

2.3. Purchase Intention through the M-Commerce Platform

Social cues of anthropomorphic agents in websites can drive users to purchase through the platform [11, 43, 46]. Relatedly, the source credibility of product speakers and celebrity endorsers positively affects purchase intention in digital commerce [36, 4749]. The effects of source expertise on purchase intention can occur with technological sources conveying social cues, as informed by the computers-are-social-actors paradigm. A recent study has shown that chatbots’ perceived credibility and competence contributed to higher purchase intention [11]. Another study [43] revealed that embodied virtual agents resembling product advisors whose gender (male/female) matched with their product gender type (feminine/masculine) elevated perceived expertise of technological source and purchase intention through the website. Moreover, an experiment [27] attached labels to infuse specialty cues into smartphone hardware and application, and the cues enhanced the user’s purchase intention toward the advertised product. In the context of a multi-product category e-commerce website, users reported higher purchase intention through the web stores featuring virtual agents framed as product-specific advisors [12, 13].

The literature states that trusting beliefs toward the platform consequently drive users’ intention to purchase through the platform [50, 51]. In a study examining the social presence effects of an embodied virtual agent in a website [46], only trust ability of the website (and not trust benevolence and trust integrity) led to higher patronage intention. An experiment comparing a multi-agent versus a single-agent interface demonstrated that website trust ability and trust benevolence were significant predictors of intention to purchase through the website [12].

2.4. Social Presence and Multiple Source Effect

The role of social presence is crucial in digital commerce, as it drives positive reactions of users toward website trust and purchase/patronage intention [50, 52, 53]. Studies have shown that the evoked sense of social presence from anthropomorphic agents’ social cues affects trust in online platforms [12, 46, 54, 55]. Social presence has been conceptualized as the degree to which users feel the experience that other intelligent beings coexist and interact with them within the digital environment, even when the virtual artifacts are not real-life [28, 56]. This conceptualization has been adopted in some chatbot studies [1, 57]. Relatedly, social presence has also been conceptualized as the degree of social cues afforded through interacting with the technological source as “warm,” “personal,” and “sociable” [52, 53, 58]. The present paper follows this definition following most chatbot studies adopting it [11, 5961].

According to the multiple source effect theory, the feeling of social presence sensed by a user in a digital environment will be amplified when numerous voice sources are embedded within the digital environment [62]. The theory asserts that as each voice source equals a source of social presence; hence, the presence of multiple voice sources should contribute to a higher sense of perceived social presence. This effect was first shown in a study with a book review website; the implementation of multiple artificial voices led listeners to experience higher social presence than the implementation of a single artificial voice [62]. In a multi-agent study [28], domain-specific IoT devices with multiple unique voices for conveying information about weather, traffic, or event, respectively, led to a higher sense of social presence than the generalist IoT device with a single voice for conveying information about weather, traffic, or event. The finding consequently inspired other scholars [12] to reproduce the multiple source effect with embodied virtual agents. Accordingly, users engaging with multiple virtual agents in a website interface reported a greater social presence than those in the single-agent website interface.

2.5. Users’ Gender and Social Cues of Agents

Females and males tend to engage in different processing — while females tend to utilize a peripheral route that focuses on the message source’s affective attributes, males tend to use a central route emphasizing the message content [13, 30]. Relatedly, females have superior processing of socio-emotive cues during social interaction in a technologically mediated environment [63]. Hence, they are better at decoding social cues exhibited by anthropomorphic agents and are more likely to form judgments and attitudes based on these cues [64]. A study [30] showed that females tended to react negatively toward multiple female agent voices while responding more approvingly toward a single consistent female agent voice. Conversely, males preferred the multi-agent scenario where unique female voice agents were assigned to different devices. On the other hand, another experiment [13] revealed that while both males and females reacted positively toward the multi-agent platform, the positive effects of specialty cues derived from the multi-agent interface design were more pronounced for females than males. The lack of studies and contrasting findings concerning gender role in a multi-agent context inspires this study to examine the moderating role of user’s gender in the effects of multi-chatbot and single-chatbot interfaces.

3. Methodology

3.1. Conceptual Framework

This study’s independent variable is the m-commerce chatbot interface type — the multi-chatbot and single-chatbot. Drawing on the literature review, the dependent variables of this study are perceived agent expertise, m-commerce trusting beliefs (ability, benevolence, integrity), intention to purchase through the m-commerce platform, and perceived social presence in the m-commerce platform. Figure 1 illustrates the conceptual framework of this study.

3.2. Stimuli
3.2.1. Chatbot Design

We designed four unique chatbot avatars resembling females in graphical cartoon format to embody the chatbot personas. The agents were gendered as females as they convey qualities crucial in human-agent interaction such as caring, sincere, and emphatic [43]. Moreover, this study considered cartoonish than a realistic form more appropriate for chatbot avatars in practical settings. In light of the frustration and disappointment that may arise because the current agent technology has yet to match users’ expectations fully, avatars rendered in cartoonish semblance can help minimize unrealistic expectations toward the chatbot systems [59, 61, 6568]. Figure 2 illustrates the chatbot avatars representing the multi-chatbot and single-chatbot interfaces.

3.2.2. M-Commerce Chatbot Plaftorm

The avatars were embedded into a mock interface resembling an m-commerce platform integrated within a Facebook messenger application. With over 300,000 bots as of 2018 [69], the Facebook messenger is one of the most used and familiar messaging applications among users and has been examined in recent user-chatbot interaction research [2, 70]. The mock m-commerce platform offered three product categories: electronics, health products, and home appliances.

3.2.3. Multi-Chatbot and Single-Chatbot Interface Design

Two videos, one for single-chatbot interface (duration = 375 s) and the other for multi-chatbot interface (duration = 380 s), were developed to simulate user-chatbot text conversations within the mock m-commerce platform. The use of video as stimuli was similar to the experimental approach found in prior studies examining the effects of agent design (virtual agent and chatbot) on user perceptions [71, 72]. The simulated conversation involved a user (which the participants would need to imagine themselves as) seeking recommendations for a laptop, a fitness tracker, and a portable air cooler; consequently linking the user’s queries to the three product categories offered in the m-commerce platform — electronics, health, and home appliance. These products were considered high-involvement because the purchase decision mirrored one’s well-being facets, including health, comfort, and productivity [12, 13]. Moreover, expensive and sophisticated products tend to increase consumers’ involvement in purchase decisions as they would have to endure the consequences of purchase risk like defective or mismatched product price and value [73].

The single-chatbot interface video simulated the textual interaction between a user and one chatbot (Lydia) assigned to provide recommendations across all three product categories. The user requested suggestions for a laptop (electronic product zone), then a fitness tracker (health product zone), and finally a portable air cooler (home appliance product zone). Only one chatbot (Lydia) dispensed recommendations for all the products mentioned earlier. The multi-chatbot interface featured a textual conversation between the user and four chatbots — one designated as a storefront manager (Lydia); whereas the other three were appointed as product-specific advisors offering recommendations within their respective product categories, including Chloe, the designated chatbot for electronic product zone; Eva the designated chatbot for health product zone; and Jenny the designated chatbot for home appliance product zone. Therefore, each product-related query by the user was answered by the respective chatbot assigned to her unique product zone. Moreover, the three specially designated chatbots were set with dialogue scripts that introduced themselves as product-specific advisors (“Greetings to you, I am your chatbot advisor who is dedicated to the electronic devices product category”). In addition, the chatbots were given scripts that introduced fellow chatbots as product-specific advisors, including “Let me introduce Chloe, our chatbot product advisor especially for electronic devices, to advice on your preference.” Apart from the dialogues, the scripts about the product information were constant between the generalist chatbot interface design and domain chatbot interface design.

The videos were reviewed by three academics whose research fields were related to human-computer interaction. Minor adjustments were made to improve the video speed and font visibility, per the acquired feedback. Figure 3 illustrates the videos simulating the mock user-chatbot conversations for the multi-chatbot and single-chatbot interfaces.

3.3. Dependent Measures

The questionnaires used in this study were adapted from relevant studies. The items and instructions were all worded in English, as the participants of this study were pooled from undergraduates at a large private university that uses English as the medium of instruction.

3.3.1. Perceived Competence of Chatbot

A five-point Likert scale asked participants to rate the following statements, including “The chatbot(s) is/are expert(s) on the products,” “The chatbot(s) specialize(s) in the products,” and “The chatbot(s) is/are knowledgeable about the products.” These items were adapted from studies that have assessed virtual agents’ perceived expertise in multi-agent interfaces [12, 13, 27].

3.3.2. Perceived Trust Ability toward M-Commerce Platform

A five-point Likert scale asked participants to rate the following statements, including “This m-commerce platform is competent and effective in its interactions with me,” “This m-commerce platform performs all of its roles very well,” “Overall, this m-commerce platform is capable and proficient,” and “In general, this m-commerce platform is informative.” This scale was adapted from related studies [12, 13, 26], which measured the perceived ability of embodied virtual agents and chatbots in multi-agent technological sources.

3.3.3. Perceived Trust Benevolence toward M-Commerce Platform

Adapted from the scale used in prior works concerning multi-agent interfaces with embodied virtual agents and chatbots [12, 13, 26], the five-point Likert scale asked participants to rate the following statements, including “I believe that this m-commerce platform would act in my best interest,” “If I required help, this m-commerce platform would do its best to provide assistance,” and “This m-commerce platform is interested in my well-being and not just its own.”

3.3.4. Perceived Trust Integrity toward M-Commerce Platform

Adapted from the scale used in prior studies on embodied virtual agents and chatbots in multi-agent interfaces [12, 13, 26], the five-point Likert scale asked participants to rate the following statements, such as “This m-commerce platform is truthful in its dealing with me,” “I would characterize this m-commerce platform website as being honest,” “This m-commerce platform keep its commitments,” and “This m-commerce platform is sincere and genuine.”

3.3.5. Perceived Social Presence in M-Commerce Platform

Adapted from the scale used in research on multi-agent interfaces [12, 58], the measure asked participants to rate on a five-point Likert scale on the following questions, such as “I can feel a sense of human contact with the m-commerce platform,” “I can feel a sense of human sociability with the m-commerce platform,” “I can feel a sense of human warmth with the m-commerce platform,” and “I can feel a sense of human sensitivity with the m-commerce platform.”

3.3.6. Purchase Intention through M-Commerce Platform

Similar to the assessment utilized in the studies on embodied virtual agents and chatbots in multi-agent interfaces [12, 13, 26], the five-point Likert scale contained the following items to be rated by participants, including “I will consider buying a product through the m-commerce platform,” “I would like to try a product from the m-commerce platform,” and “I will surely buy a product from the m-commerce platform.”

3.4. Pilot Study Assessing Reliability of Dependent Measures

A pilot study was conducted to check the reliability of the dependent measures. Thirty-three recruited participants (none of whom participated in the main experiment) watched the stimuli videos and after that filled up the survey set measuring perceived competence of chatbot, perceived ability of m-commerce platform, perceived benevolence of m-commerce platform, perceived integrity of m-commerce platform, perceived social presence, and purchase intention through the m-commerce platform. Reliability analyses showed that all variables had α values of more than 0.7, indicating that the dependent measures were reliable with good and acceptable internal consistency.

3.5. Research Design and Online Experiment

This study employed a between-subjects online experimental design wherein participants were given exposure to either the multi-chatbot or single-chatbot interface design video. Due to the current restrictions to physical meetings, the experiment was conducted using Alchemer, an online survey platform. We inserted a script in the survey platform informing participants that they were about to view a video simulating textual user-chatbot conversation within a multi-product category platform. The script also reminded participants to imagine themselves as the user in the simulated interaction during the video presentation and that they would need to fill out a survey after the video display.

The online survey tool afforded systematic A/B randomization, and hence, participants were randomly presented with either the multi-chatbot or single-chatbot interface design video. After the video presentation, the survey platform prompted the participants to answer the items measuring perceived competence of chatbot, perceived ability of m-commerce platform, perceived benevolence of m-commerce platform, perceived integrity of m-commerce platform, perceived social presence, and purchase intention through the m-commerce platform.

The authors of this study initiated a word-of-mouth invitation to undergraduates of business and information science major at a large Asian university that uses English as the medium of instruction. We set the online survey tool to disallow mobile device access (such as smartphones and tablets) to ensure that the videos must be viewed with the appropriate frame size. Further, the online tool was set to ensure that the response activity for the survey could only be performed once for each computer to circumvent duplication or repeated attempts. Participants assessed the online survey platform through a given link over four days, yielding a total of 192 completed responses.

3.6. Data Analyses
3.6.1. Filter

Based on two exclusion criteria, we dismissed participants’ data who might have been inattentive or experienced technical issues during the online survey. First, we discarded participant’s data with incorrect answers to any of the two post-video questions asked in the survey platform: 1) State the number of the product(s) offered in the m-commerce platform, and 2) State the number of the chatbot(s) featured in the m-commerce platform. Further, based on the timestamp records, we dropped participants’ data with a video page period less than the respective video duration (indicating that the videos were not viewed fully or at all). Thirty-eight participants’ data were removed from the 192 completed responses. Thus, the subsequent data analyses were conducted with 154 valid responses.

3.6.2. Participant’s Profile

The data indicated that the participants were all aged between 20 and 25 years old, and the gender ratio was 46% male and 54% female. Concerning e-commerce experience, 32.5% of the participants had less than a year of e-commerce experience, 43.5% had e-commerce experience between 1 year and five years, and 24% had more than five years of e-commerce experience.

3.6.3. Descriptive Data

Table 1 illustrates the means and standard deviations of the dependent variables between the multi-chatbot interface and the single-chatbot interface.

3.6.4. Main Analyses

We conducted a series of 2 (participant’s gender: male vs. female) X 2 (m-commerce chatbot interface: multi- chatbot vs. single chatbot) ANOVAs for perceived chatbot competence, social presence, and purchase intention, respectively. Relatedly, a 2 (participant’s gender: male vs. female) X 2 (m-commerce chatbot interface: multi-chatbot vs. single chatbot) MANOVA was also conducted for m-commerce platform trust ability, m-commerce platform trust benevolence, and m-commerce platform trust integrity; given that the variables were conceptually related. For the two-way ANOVAs and the planned comparisons through one-way ANOVAs, Levene’s test for homogeneity of variances for each analysis showed that the significance values were greater than .05; hence, the homogeneity of variance assumptions was not violated. Likewise, the two-way MANOVA indicated the values produced by Box’s Test of Equality of Covariance Matrices as p > .001, as well as Levene’s test for homogeneity of variance for each dependent variable as p > .05. These values, therefore, affirmed that the assumption of homogeneity of variance-covariance matrices and the premise of the equality of variance for each variable were not violated.

3.6.5. Perceived Chatbot Competence

Concerning perceived chatbot competence, the result found significant interaction effect between the chatbot interface types and the user’s gender on the dependent variable [F(1,150) =7.64, p = .00, ηp2 = .05]. Additional one-way ANOVAs split by the user’s gender demonstrated that for male participants particularly, the single-chatbot m-commerce interface led to significantly higher perceived chatbot competence than the multi-chatbot m-commerce interface [F(1,69) =6.39, p = .01, ηp2 = .09]. The main effect of the chatbot interface types on perceived chatbot competence was non-significant [F(1,150) =1.25, p = .52, ηp2 = .008].

3.6.6. Trusting Beliefs toward the M-Commerce Platform

The two-way MANOVA for trusting beliefs toward the m-commerce platform (ability, benevolence, and integrity) indicated a significant difference between the chatbot interface types on the combined dependent variables [F(3,148) =6.31, Λ = .887, p = .00, η2 = .113]. This qualifies further examination to uncover the between-subjects effects for each dependent variable. As per the Bonferroni adjustment of alpha level with three dependent variables, we considered the between-subjects effects significant at p = .017 [74]. The results revealed a significant main effect of the m-commerce chatbot interface types on platform trust ability [F(1,150) =17.87, p = .00, ηp2 = .106] — specifically, participants attributed significantly higher trust ability toward the single-chatbot m-commerce interface than the multi-chatbot m-commerce interface. While not reaching statistical significance, participants in the single-chatbot m-commerce interface condition reported marginally higher trust benevolence toward the m-commerce platform [F(1,150) =4.615, p = .033, ηp2 = .03] as well as higher trust integrity toward the m-commerce platform [F(1,150) = 5.702, p = .018, ηp2 = .037] than those in the multi-chatbot m-commerce interface condition.

3.6.7. Purchase Intention through the M-Commerce Platform

With respect to purchase intention through the m-commerce platform, the result found significant interaction effect between the chatbot interface types and the user’s gender on the dependent variable [F(1,150) =5.84, p = .01, ηp2 = .04]. Additional one-way ANOVAs split by the user’s gender demonstrated that for male participants particularly, the single-chatbot m-commerce interface led to significantly higher purchase intention through the m-commerce platform than the multi-chatbot m-commerce interface design [F(1,69) = .4.81, p = .03, ηp2 = .06].

3.6.8. Perceived Social Presence

Concerning perceived social presence, the result indicated a significant main effect of m-commerce chatbot interface types on perceived social presence [F(1,150) = 5.70, p = .02, ηp2 = .04] — specifically, participants who interacted with the single-chatbot m-commerce interface reported significantly higher perceived social presence than those who interacted with the multi-chatbot m-commerce interface. The interaction effect between the chatbot interface types and the user’s gender was non-significant [F(1,150) = 2.78, p = .098, ηp 2 = .018].

3.7. Mediation Analyses

The main analyses indicated that the male participants ascribed significantly higher perceived chatbot competence when they engaged with the single-chatbot than the multi-chatbot interface. Based on the literature demonstrating the association between source credibility of chatbot/agent and purchase intention [11, 12, 26, 43], we were interested in determining perceived chatbot competence as a mediator in the relationship between chatbot interface types (multi-chatbot vs. single-chatbot) and intention to purchase through the platform among the male participants. The PROCESS-macro mediation analysis using the indirect bootstrapping method with 5000 bootstrap resamples (Preacher & Hayes, 2004) affirmed the mediating role of perceived chatbot competence in the effects of chatbot interface types on intention to purchase through the platform among the male participants, B = -.16, 95% CI [-.36, -.03]. Furthermore, when controlling for perceived chatbot competence, the correlation between chatbot interface types and intention to purchase through the platform became non-significant, B = -.21, t(69) = -1.28, p = .20; thus indicating that perceived chatbot competence was a full mediator in the relationship for the male participants. Taken together, for the male participants, the single-chatbot interface (versus the multi-chatbot interface) enhanced perceived chatbot competence, which in turn led to higher intention to purchase through the platform.

The previous comparison analyses revealed that the single-chatbot interface led to significantly higher perceived trust ability of the platform and a significantly higher sense of social presence for all participants in general. Given these findings, we next explored whether perceived social presence mediated the effects of chatbot interface types on perceived trust ability of the platform, based on the literature showing that social presence from anthropomorphic agents can influence trusting beliefs toward the online platforms [12, 46, 53, 54]. The PROCESS-macro mediation analysis using the indirect bootstrapping method with 5000 bootstrap resamples endorsed the mediating role of perceived social presence in the effects of chatbot interface types on the trust ability of the platform, B = -.09, 95% CI [-.21, -.02]. When controlling for perceived social presence, the correlation between the chatbot interface types and trust ability remained significant, B = -.29, t(152) = -3.34, p = .00; thus indicating that perceived social presence was a partial mediator in the relationship. In other words, the single-chatbot interface (versus the multi-chatbot interface) induced a stronger sense of social presence in the platform, which partially contributed to higher trust toward the platform’s ability among the participants. Table 2 summarizes this study’s findings.

4. Discussion

This study revealed that participants attributed a stronger sense of social presence and trusting beliefs toward the m-commerce platform when engaging with the single-chatbot interface than the multi-chatbot interface. Particularly for males, the single-chatbot interface (vs. the multi-chatbot interface) also led to enhanced perception of chatbot competence and higher intention to purchase through the m-commerce platform. Framing chatbots as domain-specific product advisors in the multi-chatbot interface failed to trigger appropriate specialization schema in participants’ minds and hence did not evoke participants to categorize the chatbots as specialists. Thus, this study, together with prior research [14, 30, 31], challenges the robustness of the single functionality as a specialty cue in evoking users to categorize and perceive domain-specific agents as specialists.

A finer-grained analysis of the literature revealed some inferences surrounding this observation. In particular, the single functionality as specialty cue effect can be reinforced with authoritative labels/social descriptors accentuating specialization. Per the categorization theory [75], users across two experiments attributed greater perceived agent competence when social descriptors signaling specialization (“Hi, I am Anna — your camera product specialist”) were affixed to the product-specific agents [12, 13]. However, this study did not devise the product-specific chatbots in the multi-chatbot interface to present themselves as a “product specialist” or “product expert.” Instead, the chatbots in the multi-chatbot interface were scripted to introduce themselves as “dedicated product advisor for [product category],” which might not have conveyed a strong enough cue for evoking the participants to categorize them as products specialists. Hence, the mere assignment of chatbots to advise on specific product domains the multi-chatbot interface, albeit devoid of authoritative labels/social descriptors signaling specialization such as “product specialist” or “product expert,” was ineffective in producing a social cue triggering specialization schema in user’s mind. This could account for the lack of positive effects of single functionality as specialty cues in multi-agent studies incorporating domain-specific agents without labels/social descriptors accentuating specialization [30, 31]. Specifically, researchers in a similar study [30] implied that the labels affixed to the voice agents, i.e., “SIRI” and “SORI,” were novel to users and held no cues for prompting users to categorize the entities as specialists or generalists.

Why did the single-chatbot m-commerce interface evoke higher perceived chatbot competence and greater intention to purchase through the m-commerce platform, particularly among the male participants? The mere assignment of chatbots to advise on specific product domains without robust labels/social descriptors highlighting specialization could have prompted the male participants to appraise these chatbots as novice product advisers possessing limited product domain knowledge. On the other hand, the generalist chatbot in the single-chatbot interface was socially perceived by the male participants as more competent, based on her broader expertise in dispensing advice across diverse product categories. Per the mediation analyses, the higher perceived chatbot competence induced by the single-chatbot interface (vs. multi-chatbot interface) led to higher purchase intention through the platform among the males. This observation supports the notion that the source credibility of agents (agent’s competence or expertise) can drive the user’s intention to use the agent-infused platforms [11, 13, 27, 43].

Why did the single-chatbot interface evoke higher trusting beliefs (ability, benevolence, and integrity) toward the m-commerce platform than the multi-chatbot interface? The mere stratagem of designating chatbots to advise on specific product categories (without labels as specialty cues) was ineffective in invoking specialization heuristics in participants’ minds; hence, the multi-chatbot interface did not convey positive social cues for accentuating source credibility or the reputation of the m-commerce platform. Without source credibility cues to function as “proof sources,” participants could not easily verify the expertise and the trustworthiness of the multi-chatbot m-commerce platform [11, 12, 26, 41, 43], thereby causing them to attribute less trusting beliefs toward the multi-chatbot m-commerce platform. Recent studies have found that users felt more confusion [15] and favor interacting with a consistent agent than many agents as multiple sources can impose extraneous cognitive load [30]. Further, the participants could have sensed the multi-chatbot interface as unfamiliar and not the market’s standard interface. The unfamiliarity of the multi-chatbot interface was compounded by the participants experiencing confusion and added cognitive load when attending to the different chatbots. In this regard, the unfamiliarity cues above could have harmed trusting beliefs toward the multi-chatbot m-commerce platform.

Grounded on the multiple source effect theory [62], studies have shown that multiple voice sources can induce a stronger sense of social presence in users than single-voice interfaces [12, 13, 28]. This study demonstrated a reversal effect on the perceived social presence (single-chatbot leads to higher social presence than multi-chatbot). Hence, this study did not support the multiple source effects with text-cued chatbots. Plausibly, the single-chatbot interface (versus the multi-chatbot interface) led to a higher sense of social presence because the single agent following users across diverse domains naturally affords a longer user-agent interaction duration than a multi-agent interface where each of the many agents has a more brisk interaction with users [31]. The single-chatbot interface might have allowed the participants to experience a longer duration of user-chatbot interaction because the same chatbot was present throughout the recommendation process across the many product categories, thereby eliciting the perceptions that the m-commerce platform was imbued with human-like social qualities. Conversely, the participants engaged with the multi-chatbot interface had inadequate time to effectively sense a social presence in the platform afforded by each chatbot — as the user-chatbot conversation cascaded from one chatbot to another rapidly and transiently. The mediation analyses revealed that the enhanced social presence felt by the participants in the single-chatbot interface condition partially explained the effects of the single-chatbot interface (vs. the multi-chatbot interface) on increased trust ability toward the m-commerce platform. To some degree, this aligns with the extant literature showing that the sense of social presence derived from an anthropomorphic agent’s presence and cues is positively associated with the user’s trust toward the agent-infused platform [12, 13, 28, 41, 45, 46, 54, 55].

There were some gender effects in this study. Only males were affected by the different chatbot interfaces concerning perceived chatbot competence and intention to purchase through the platform. This finding could be attributed to the theoretical perspective that male online shoppers emphasize utilitarian and convenience qualities that guarantee to receive practical values; whereas, females focus on trustworthiness and the ability to interact within digital commerce platforms socially [7678]. Moreover, utilitarian values are often associated with the qualities of the information and interaction afforded by recommender agents as message sources [66, 79, 80]. Possibly, the effects of the chatbot interface type on perceived chatbot competence and intention to purchase through the platform were more pronounced for males than females because they focused more attentively on the chatbot’s source credibility cue to appraise the practical and convenience values of the m-commerce platform, which then drive their decision to patronize the system.

5. Design and Managerial Implications

This study implies that assigning chatbots to specific domains may not be adequate for evoking users to categorize the chatbots as products specialists. Thus, we recommend converging other cues such as labels (“wine specialist”) [29, 81], dialogues, such as “I am Adrienne — a home decoration specialist advisor!” [12, 13, 26], and social attributes including gender or age [25, 43] with the chatbot’s unique product assignment to activate specialization heuristics in users’ minds. Without specialty cues, designers should favor a single-chatbot interface over a multi-chatbot interface because the latter can induce unfamiliarity, confusion, and unnecessary cognitive load to users [14, 15, 30]. A single-chatbot interface that follows users throughout the product recommendation process will evoke firmer trusting beliefs toward the m-commerce platform and a more significant social presence than the multi-chatbot interface featuring different chatbots as domain-specific agents within the user-chatbot experience.

M-commerce managers and e-tailers should be aware that users react socially to chatbots. In light of this, besides the functional and technical design aspects, chatbot social design should be considered carefully, as social cues can influence users’ perceived source credibility, trust, and purchase intention. Moreover, m-commerce managers and e-tailers should note the user’s gender differences in m-commerce expectations and experiences; and that males and females may approach user-chatbot engagements differently. Specifically, males may focus on practical and convenience values while females may emphasize social and emotional values in m-commerce. Hence, m-commerce managers and e-tailers can incorporate different chatbot interfaces to conform with the expectations of male and female shoppers.

6. Limitations and Recommendations for Future Research

Some limitations of this study are acknowledged. First, the sample participants of the experiment consisted of undergraduates in an Asian university, thereby limiting the generalizability of findings for a broader group of users. Hence, replication and reproducibility of this research can be conducted to include respondents with different demographic profiles. Second, per the approach used in prior human-agent interaction studies [71, 72], this experiment used videos simulating user-chatbot engagement as stimuli for assessing user’s perception and use intention concerning the multi-chatbot interface design and single-chatbot interface design. This endeavor might have produced results that failed to capture nuanced perceptions indicative of users’ actual engagement with chatbots in natural settings. To increase the generalizability of experimental findings to authentic user-chatbot engagement, future studies can utilize existing chatbot platforms [14] or Wizard-of-Oz experiments [15] for establishing a more genuine user-chatbot interaction. The present experiment involved transient simulations of user-agent interaction; and, thus, might have captured the user’s perception and intention based on initial impressions. Future research can utilize stimuli with extended duration to ascertain the perceptual and behavioral effects of framing chatbots as domain-specific specialist agents that go beyond initial interaction with the entities.

Concerning multi-chatbot or multi-agent interface, future studies can be conducted to tease apart the subtle and cumulative effects of labels/social descriptors and single functionality as specialty cue effects. Based on the proposition that some labels may not produce a strong enough cue for triggering specialization schema, future studies can also examine the effectiveness or “strength” of different parameters of specialty cues, including different terms and phrases in labels. Moreover, the effects of other potential specialty cues apart from derivatives of labels/social descriptors and single functionality should be explored with chatbots or agents within a multi-agent interface, including title (“Dr.”), gender, dressing style, and communication style [25]. Future work should further address the effects of induced confusion and extraneous cognitive load in multi-agent systems. Given that gender of users has been found to affect user-chatbot/agent experience differently, future works on multi-chatbot or multi-agent interfaces should further examine the role of gender and e-commerce/m-commerce experience as potential moderating factors. Last but not least, the effects of multi-agent interface should be explored with other social, psychological, and communication theories across diverse application domains like health, education, counseling, and finance.

Data Availability

Data is available on request.

Conflicts of Interest

The authors report no conflict of interest.

Acknowledgments

We thank Wei Ming Pang for assisting with the online experiment. This research was supported by the Malaysia Ministry of Higher Education (FRGS/1/2021/SS0/MMU/02/8) titled “A Design Framework of E-Commerce Chatbot with Social Cues to Enhance User-Chatbot Experience in Malaysia.”