See you soon again, chatbot? A design taxonomy to characterize user-chatbot relationships with different time horizons

Users interact with chatbots for various purposes and motivations – and for different periods of time. However, since chatbots are considered social actors and given that time is an essential component of social interactions, the question arises as to how chatbots need to be designed depending on whether they aim to help individuals achieve short-, medium-or long-term goals. Following a taxonomy development approach, we compile 22 empirically and conceptually grounded design dimensions contingent on chatbots ’ temporal profiles. Based upon the classification and analysis of 120 chatbots therein, we abstract three time-dependent chatbot design ar-chetypes: Ad-hoc Supporters, Temporary Assistants, and Persistent Companions. While the taxonomy serves as a blueprint for chatbot researchers and designers developing and evaluating chatbots in general, our archetypes also offer practitioners and academics alike a shared understanding and naming convention to study and design chatbots with different temporal profiles.

Users' primary motivations to engage with conversational agents 1 are manifold, which is reflected in the variety of conversational agents available ranging from the popular general-purpose voice assistants SIRI (Apple, Inc.) or ALEXA (Amazon, Inc.) to domain-specific text-based chatbots like the mental health chatbot WOEBOT or the scheduling assistants AMY and ANDREW (x.ai, Inc.). Another basic difference concerns whether users intend to interact with a chatbot only oncefor instance, with an e-service chatbot helping users to find a specific product (Chung, Ko, Joung, & Kim, 2020) or for multiple, continuous interactions over longer periods of time, such as with a healthcare chatbot supporting patients to manage a chronic disease (Kowatsch et al., 2018). Thus, designing chatbots fundamentally hinges on their "temporal profile" which encompasses the prospective time horizon of using the chatbot as well as the duration and frequency of individual interactions throughout the entire user-chatbot relationship (Baraka, Alves-Oliveira, & Ribeiro, 2020).
However, despite diverse chatbot characteristics that have previously been investigated with regards to consequential design implications, for example, whether chatbots serve general or domain-specific purposes (Gnewuch, Morana, & Maedche, 2017) or whether chatbots are intended to engage in dyadic one-to-one or in multiparty interactions (Seering, Luria, Kaufman, & Hammer, 2019), there is a scarcity of empirical research on design differences contingent on chatbots' temporal profiles.
Therefore, the goal of the current work is to determine whether and how chatbots' different temporal profiles affect design considerations. The following two research questions guide our work: "Which design elements allow us to distinguish chatbots depending on whether they are aimed to help individuals to achieve short-, medium-or long-term goals?" (RQ1) and "How does a chatbot's temporal profile affect its design?" (RQ2).
Drawing attention to temporal aspects in user-chatbot relationships and examining the contingency of design choices on a chatbot's temporal profile will be of significant benefit, both, to practitioners using chatbots and managers in the early stages of chatbot development to guide the decision on which design elements need to be tailored to the time horizon of the user-chatbot relationship, and which design elements are neglectable or cumbersome.
This work also raises novel theory-related questions for humancomputer interaction and (computer-mediated) communication researchers: If chatbots are used for one-time-only conversations, users will likely seek to get something done quickly via the chatbot, which makes the chatbot a mere "communication medium" (Zhao, 2006, p. 402). In contrast, if chatbots are used to achieve a specific personal long-term goal, users will rather be committed to undergo longer personal learning or development processes together with the chatbots which emphasizes the notion of chatbots as "social actors" (Reeves & Nass, 1996).
In the following, we review human-computer interaction and computer-mediated communication literature on time-dependent design aspects and provide a preliminary definition of short-, mediumand long-term chatbots. To answer our first research question, we then develop a taxonomy of time-dependent design aspects of chatbots (Section 3). The taxonomy development process follows the widely used taxonomy development procedure suggested by Nickerson, Varshney, and Muntermann (2013) and consists of two conceptual-to-empirical and five empirical-to-conceptual iterations. In total, we classify 37 chatbots described in research articles and 83 chatbots in the real world. To answer our second research question, we analyze and compare design characteristics of the classified chatbots systematically regarding their temporal profile (Section 4.1). Based on this analysis, we propose three chatbot archetypes (i.e., Ad-hoc Supporters, Temporary Advisors, and Persistent Companions), which allow researchers and practitioners to account for time-dependent aspects in chatbot design and thus provide common ground for further work (Section 4.2). Finally, we discuss broader implications of chatbots' temporal profiles and outline limitations and recommendations for further research (Section 5).

Conceptual background
The focal point of our research are time-dependent design aspects of domain-specific text-based conversational agents; here referred to as "chatbots". For the purpose of our analysis, we define chatbots as "software-based systems designed to interact with humans via textbased natural language" (Feine, Adam, Benke, Maedche, & Benlian, 2020, p. 127) that mimic common human-human conversations (Araujo, 2018) within the boundaries of a specific domain-knowledge (Gnewuch et al., 2017).
Depending on their time horizon configuration, domain-specific chatbots can be characterized as short-term, medium-term, long-term, or life-long chatbots (Baraka et al., 2020). Chatbots designed for helping individuals to achieve short-term goals are defined by a single or very few occasional short interaction(s), while the latterthat is, chatbots designed for supporting individuals in achieving medium-to long-term, or even life-long goalsare comprised of multiple (interdependent) interactions over a certain period (Baraka et al., 2020, p. 29). Typical examples for short-term relationships are chatbots offering brief ad-hoc services such as customer support (e.g., JAEGER-LECOULTRE) or self-diagnosis healthcare chatbots such as BABYLON or GYANT, whereas typical medium-and long-term examples are chatbots for monitoring chronic conditions (e.g., WOEBOT) or learning processes (e.g., DUOLINGO).
Communication researchers have examined the role of time in social interactions in face-to-face (Werner, Altman, & Brown, 1992) and computer-mediated communication (Hesse, Werner, & Altman, 1988), for example, in reference to how relationship building processes in groups or between intimate partners develop through identifiable steps or stages (Hesse et al., 1988), or the impact of chronemic cues on perceived sender's intimacy (Walther & Tidwell, 1995). Construing chatbots as social actors (Ho, Hancock, & Miner, 2018) that can act on their own as "novel, human-created communication entities, playing their own social role" (Hoorn, 2018, p. 1) implies similar relationship-building processes between users and chatbots under the Media Equation Theory umbrella (Reeves & Nass, 1996). However, to account for such relationship processes in short or longitudinal user-chatbot relationships, chatbots likely need to be equipped with specific features or design elements to meet user expectations.
Consequently, some research groups have dedicated their work to understanding longitudinal relationship-building processes with chatbots (e.g., Bickmore & Picard, 2005). In human-robot interaction research, different temporal profiles of robots have long been acknowledged to be a major design characteristic that has crucial implications on their interactions with users (Baraka et al., 2020;Shibata, 2004;Yanco & Drury, 2004). A robot's temporal profile can be characterized by the following time-dependent dimensions: the time horizon as the total period during which the user engages with a robot, the duration of (individual) interaction(s), and the frequency in the case of multiple interactions (Baraka et al., 2020). The fourth dimension in human-robot interaction research concerns synchronicity which describes whether a (remotely controlled) robot responds immediately (synchronously) or delayed (asynchronously) when it is located in a more distant place. Considering robots as chatbots at the physical extreme of the "reality-virtuality continuum" (De Keyser, Köcher, Alkire, Verbeeck, & Kandampully, 2019) allows transferring some of those insights to virtual chatbots as well. While synchronicity seems less applicable to messenger-based chatbots which are virtually available around the clock, timespan, duration, and frequency are relevant dimensions to distinguish chatbots with different temporal profiles.
Conversely, while one scientific study of 103 domain-specific realworld chatbots found that the vast majority (84%) of them were developed for short-term purposes (Janssen, Passlick, Rodríguez Cardona, & Breitner, 2020), it is particularly difficult to find research on chatbots with a short-term temporal profile: A SCOPUS search on August 27, 2020, for academic material on the search string ("long*term" AND ("chatbot*" OR "conversational agent*" OR "relational agent*")) revealed 178 documents of which 89 (50.0%) actually dealt with chatbots that are built to foster and maintain long-term relations with users. The same search string but looking for "short*term" instead yielded 99 results of which only seven (7.1%) covered chatbots that are developed to support short-term goals. One reason might be limited research on short-term chatbots. Another might be that is more difficult to identify relevant academic literature on the topic of chatbots developed for short-term purposes because it is not called or tagged as such, making it cumbersome for researchers and practitioners alike to compare and derive design and evaluation guidelines when developing chatbots for short-term goals.
In line with the two research questions, this work's objective is thus twofold: First, to identify all design elements contingent on the temporal dimension of user-chatbot relationships and to develop a comprehensive design taxonomy that allows us to characterize user-chatbot relationships with different time horizons (RQ1), and, second, to quantitatively assess differences between chatbots for either short-, medium-, or longterm purposes and to illustrate typical design configurations by identifying three chatbot archetypes (RQ2).

Methodology
To answer our two research questions, we applied a mixed-methods research approach (Creswell & Clark, 2011) and combined (a) qualitative methods to develop (part 1) and evaluate (part 2) the resulting "Design Taxonomy for Chatbots with Different Temporal Profiles" (RQ1) and (b) quantitative methods to identify differences in the occurrence of design characteristics in chatbots with different temporal profiles and to develop and differentiate three distinct chatbot archetypes aimed at helping users to achieve short-, medium-, or long-term goals (RQ2). The entire research procedure is illustrated in Fig. 1. The applied methodologies in each part and iteration are summarized in Table 1 and the following sections.

Part 1: Taxonomy development
To answer our first research question ("Which design elements allow us to distinguish chatbots depending on whether they are aimed to help individuals to achieve short-, medium-, or long-term goals?"), we develop a taxonomy of design elements to classify and differentiate the design of chatbots taking into account their temporal profile.
Taxonomies are well renowned in information systems and humancomputer interaction research (Nickerson et al., 2013) as they allow the development of design principles that can inform the design of future artifacts (e.g., chatbots) based on the empirical analysis of structural patterns in existing artifacts (Williams, Chatterjee, & Rossi, 2008). A taxonomy consists of a number of dimensions, each of which has a subset of at least two characteristics. Every object that is classified based on the taxonomy must have exactly one characteristic of each dimension, neither more nor less (Nickerson et al., 2013).
Following the established taxonomy development method proposed by Nickerson et al. (2013), our taxonomy builds on existing conceptual design frameworks (Iterations 1 and 4) and extends them based on empirical observations of chatbots described in scientific articles (Iteration 2) and deployed in practice (Iterations 3 and 7). The conceptual-to-empirical iterations ensure that the taxonomy builds on and extends the latest knowledge discussed in the scientific literature, in particular, the taxonomy of design elements for domain-specific chatbots proposed by Janssen, Passlick, Rodríguez Cardona, and Breitner (2020); the empirical-to-conceptual iterations and the analysis and classification of the design of short-, medium-, and long-term chatbots introduced in scientific articles and practice increase rigor, relevance, and generalizability of the taxonomy. Table 1 provides an overview of the applied research approaches, methodologies, and the sample of analyzed chatbots tested in each particular iteration.
Before conducting the first taxonomy development iteration, Nickerson et al.'s (2013) taxonomy development procedure requires the definition of (a) a purpose of the taxonomy, and the determination of (b) a meta-characteristic as "the most comprehensive characteristic that will serve as the basis for the choice of [all other] characteristics in the taxonomy [that are based] on the purpose of the taxonomy and in turn based on the users and their expected use of the taxonomy" (Nickerson et al., 2013, p. 343) and (c) ending conditions, that determine when the taxonomy development is completed.
Specifically, the (a) purpose of our taxonomy is to provide a framework of design guidelines for chatbots that support individuals in their short-, medium-, and long-term goals. Therefore, the (b) meta-characteristics of our taxonomy are all design elements that have a visible or experiential impact on the user-chatbot interaction.
Regarding the determination of (c) ending conditions, we adopted all  Table 2 and Fig. 3.
objective and subjective conditions suggested by Nickerson et al. (2013). Throughout the entire taxonomy development process, it was constantly discussed and checked in each iteration whether all identified design dimensions and characteristics fulfilled all ending conditions (cf. Table A7 in the Web Appendix).
In total, the development of the taxonomy, required two conceptualto-empirical and four empirical-to-conceptual development iterations before all ending conditions were fulfilled (cf. Web Appendix Table A7). Summarized insights into each iteration of the taxonomy development are further outlined in the following subsections. Fig. 2 visualizes all changes on the design dimension level across iterations throughout the taxonomy development process.

Iteration 1 -Conceptual-to-empirical: Identification of chatbot design elements
In the first conceptual-to-empirical iteration, which we conducted between November and December 2019, we reviewed published chatbot design classifications and frameworks based on a narrative literature review approach. Since classification schemes and naming conventions for chatbots are fragmented along different thematic axes (e.g., multiple vs. single-user chatbots), use cases (e.g., healthcare vs. shopping chatbots), and across multiple research disciplines (e.g., information systems vs. human-computer interaction), a narrative literature review proves useful and efficient for establishing an overview of the latest developments in a condensed format (Vom Brocke et al., 2015) and to derive an initial set of design dimensions before we set out to run the first empirical-to-conceptual iteration.
The initial set of conceptually identified design dimensions is visible in Fig. 2 (this manuscript) and Table A1 (cf. Web Appendix). The detailed description of the research procedure, the used databases, and search strings as well as the initial set of design dimensions and characteristics are described in depth in the Web Appendix.

Iteration 2 -Empirical-to-conceptual: Classification of proof-ofconcept chatbots described in scientific articles
For the second iteration, we chose an empirical-to-conceptual approach to complement and test our initial set of design dimensions based on published chatbot design and development case studies described in scientific articles (Cooper, 1988;Knote et al., 2018). Concentrating first on chatbots described in scientific articles (cf. Table A3, Web Appendix) allowed us to understand which design dimensions chatbot developers and researchers focused on when developing different types of chatbots. Thereby, in this iteration, we could classify all sampled chatbots described in the studies with regards to the set of design dimensions already identified and could simultaneously look for new design dimensions that the articles' authors explicitly mentioned or discussed, and which we had not identified based on the review of conceptual design frameworks in Iteration 1.
To obtain a comprehensive set of 37 scientific articles that either focused on conceptualizing or developing chatbots in parts or as a whole, we followed a systematic literature review approach (cf. Figure A1 in the Web Appendix) including using search string in scientific databases, forward, backward, and similarity search, which "takes a structured approach to identifying, evaluating, and synthesizing research" (Vom Brocke et al., 2015, p. 9). We concentrated the search on chatbots in application domains characterized by processes that show a progressive evolution over time and where we expected to find examples of short-, medium-and long-term chatbots (i.e., healthcare, education, and business). All parameters of the systematic literature review, search strings, the screening, review and coding procedure, as well as the results of this iteration and changes in the taxonomy are described in detail in the Web Appendix.

Iteration 3 -Empirical-to-conceptual: Classification of real-world chatbots
In this iteration, we chose the empirical-to-conceptual path again. To ensure the relevance of our taxonomy, we aimed at triangulating the sample of short-, medium-, and long-term chatbots described in the latest scientific articles with state-of-the-art examples of actually available chatbots in the real-world and systematically sampled a set of chatbots from online chatbot directories such as botlist.co or thereisabo utforthat.com and from curated chatbot platforms and magazines (cf. Table A4, Web Appendix). As there are still no standardized procedures that determine how to sample or analyze chatbots "in the wild" (Seering et al., 2019), we describe our systematic approach (i.e., sampling strategy, data selection, coding, and classification procedures) in full detail in the Web Appendix.

Iteration 4 -Conceptual-to-empirical: Refinement of the taxonomy
The publication of a "Taxonomy of Design Elements for Domainspecific Chatbots" by Janssen, Passlick, et al. (2020) on April 6, 2020, allowed us to challenge and further refine our taxonomy in another conceptual-to-empirical iteration. Therefore, we compared both taxonomic structures, all design dimensions, and design characteristics hitherto and identified that we had eleven dimensions in common that were identical or very similar in meaning, four dimensions that had not been included in the aforementioned taxonomy, and five which we had not listed in ours yet. Interweaving and complementing the taxonomies promised a more comprehensive understanding of differences between chatbots with different temporal profiles since it synthesized design dimensions identified based on the analysis of chatbots from different application domains. The merging process is visible in Fig. 2 and described in the Web Appendix in full detail as well.

Iteration 5 -Empirical-to-conceptual: Re-classification of all chatbots based on refined taxonomy
To test and assess the new structure of the taxonomy again, we proceeded to carry out another empirical iteration path and coded our two chatbot samples from Iteration 2 and 3 based on the taxonomic structure and terminologies retrieved in Iteration 4. Whenever one of the design dimensions could not be assessed based on the stored data, we revisited the chatbots and updated the chat logs accordingly. As a result of continued discussions during this iteration, we substantially reordered the structure of the design dimensions as visible in Fig. 2 (cf. full methodological details in the Web Appendix).

Iteration 6 -Empirical-to-conceptual: Re-classification of chatbots to meet ending conditions
Due to the addition of one new dimension and merging three design characteristics into one in Iteration 5 (cf. Fig. 2), the ending conditions were still not fulfilled in the last iteration (cf. Figure A7 in the Web Appendix), which rendered another empirical-to-conceptual iteration imperative. In this iteration, we classified all chatbots again with a focus on the newly added dimension and the changes in the discussed design characteristics. After this iteration, all ending conditions were fulfilled and the taxonomy development process was complete.

Part 2: Taxonomy evaluation
To confirm that the taxonomy could be applied to other application domains and by individuals not involved in the development of the taxonomy, we conducted an evaluation iteration based on a new sample of real-world chatbots (cf. Table A5 in the Web Appendix). In line with the taxonomy evaluation framework by Szopinski, Schoormann, and Kundisch (2019) this evaluation iteration was characterized by the following directions: Regarding the subject of evaluation (the 'who'), we involved two additional researchers (C & D) with chatbot domain and taxonomy method expertise who had not been involved in the taxonomy development process before. Before starting with the actual evaluation phase, these two researchers provided feedback concerning the interpretation of dimensions and characteristics defined in the taxonomy codebook provided by researchers A & B (cf. Table A2 in the Web Appendix) which led to a refinement of the definition of the frequency of interactions design dimension.
Regarding the method of evaluation (the 'how'), we followed the Artificially intelligent D 9 Intelligence quotient C 9,1 Rule-based knowledge only | C 9,2 Text understanding | C 9,3 Text understanding+ D 10 Personality adaptability C 10,1 Principal self | C 10,2 Adaptive self D 11 Socio-emotional behavior C 11,1 Not present | C 11,2 Present D 12 Service integration illustrative scenario technique for which the researchers C & D applied the present taxonomy to a new set of real-world objects based on the last version of the design dimensions' and characteristics' definitions: "Applying a present taxonomy to real-world objects allows researchers to evaluate their […] usefulness for classifying, differentiating, and comparing objects as well as to evaluate their robustness, utility, efficacy, stability, and completeness" (Szopinski et al., 2019, p. 11).
Regarding the object of evaluation (the 'what'), we re-applied the 103 real-world chatbots identified and classified by Janssen, Passlick, et al. (2020). An analysis of this sample had revealed that the chatbots were completely disjunctive from our sample as Janssen, Passlick, et al. (2020) had applied a different sampling strategy that had focused on sourcing chatbots from the chatbot directories "chatbots.org" and "botlist.org" in May 2019 while we had pursued a purposive sampling strategy to identify the most popular or renowned short-and long-term real-world chatbots per application domain. Additionally, the analysis of the aforementioned sample had revealed that the sample provided a large number of chatbots that could be attributed to the application domains Business, which was less dominant in our sample, and Daily Life, which we, therefore, added as a new design characteristic to the taxonomy as well.
Between September and November 2020, researchers C & D revisited all 103 chatbots. If a chatbot was no longer available via the original URL, the chatbot's (or company's) name was used to search for the chatbot via Google Search. Eventually, only 42 chatbots were still accessible. All other chatbots were either no longer detectable on the websites (e.g., SOFIA (TRAVEL)), the websites were offline (e.g., SOA SEKS CHECK) or the chatbot did not answer anymore (e.g., IFRS ROOKIES). Some chatbots also had been (re-)replaced with live chats with human agents in the meantime (e.g., AXA).
To make sure that all possible design dimensions could be assessed for the remaining 42 chatbots, researchers C & D followed the updated semi-structured conversation guidelines (cf. details on Iteration 3 and Table A6 in the Web Appendix) to engage in conversations with the chatbots. Similar to Iteration 3, chat logs, screenshots, and personal notes were stored in an independent database which was later merged with the previous database as the analysis of the different chatbot archetypes was performed based on the full sample across all iterations.
To check the extent to which the classification of the two evaluation researchers matched with those of previous iterations, all 42 chatbots were also classified by researcher A again who had also classified all other chatbots in the previous iterations. From this, inter-coder reliability was calculated for the entire taxonomy as well as for each dimension and each inter-coder combination (C & D, C & A, and D & A). All inter-coder reliabilities were above 90% and, thus, considered satisfactory (Kassarjian, 1977). The largest variation appeared in the  dimensions D 21 motivation for chatbot use and D 12 service integration, which led to a refinement of their definitions. Overall, the evaluation participants declared the taxonomy useful, complete, and comprehensible. Since no characteristics or dimensions were deleted, added, or split in the evaluation iteration, all ending conditions were fulfilled and the final taxonomy could be confirmed (cf . Table A7, Web Appendix).

Part 3: Taxonomy application
To answer our second research question (i.e., "How does a chatbot's temporal profile affect its design?") we (i) analyzed and evaluated the distribution of design characteristics per design dimension and per temporal profile in all 120 chatbots that we had sampled for the taxonomy development (i.e., all 37 chatbots sampled from scientific articles in Iteration 2, all 41 real-world chatbots sampled in Iteration 3, and all 42 chatbots sampled to evaluate the taxonomy in Iteration 7) and (ii) developed an index to abstract three time-dependent chatbot archetypes to better understand differences in the design configuration of short-, medium-and long-term chatbots.

Frequency analysis of chatbots' time-dependent design characteristics
Since each chatbot was classified by exactly one design characteristic per design dimension (Nickerson et al., 2013), resulting in 2640 codes (22 design dimension codes * 120 chatbots), frequency analysis is suitable to be applied. Frequency analysis is a "relevant brick to bridge the gap between qualitative and quantitative methods (mixed-methods research)" and can be described "as a process that breaks down complex behaviors into smaller units [by counting] their occurrences" (Rack, Zahn, & Mateescu, 2018, chap. 14, p. 278). We counted occurrences of design characteristics per design dimension and per chatbot temporal profile. Lastly, tests of independence were conducted to detect statistically significant differences in the distribution of design characteristics between short-, medium-, and long-term chatbots: that is χ 2 -Tests or Fisher's Exact Tests (FETs) respectively. χ 2 -Tests are recommended when all cells have expected frequencies greater than or equal to 5 (Field, 2009, p. 692); FETs are particularly recommended when any expected frequencies are less than 1 (Sauro & Lewis, 2016, p. 79). Results are presented in a comprehensive contingency table (Table 3).

Chatbot archetypes development
Since "reports of descriptive metrics such as frequencies are […] in most cases not sufficient to fully understand complex aspects […] on a more general level" (Rack et al., 2018, chap. 14, p. 288), we used the frequency analysis results as a basis for further statistical analysis by calculating an index per design dimension that can be used to compare chatbots with different temporal profiles and to systematically derive time-dependent chatbot archetypes.
We computed this "Index I d " for each of the 17 design dimensions D whose characteristics can be ordered (e.g., the characteristics C 3,1 short, C 3,2 medium, and C 3,3 long of the dimension D 3 duration of interactions). This Index I d can take a value between 1 and 5 and will be computed as shown in Equation (1), where C i represents the frequency of the i-th design characteristic C d and n the number of design characteristics per design dimension D: That is, Index I d is the mean of the factored frequencies of all design characteristics of a design dimension. Index I d is computed for each design dimension and for each short-, medium-, and long-term chatbot archetype separately. All index values per design dimension and chatbot archetype are plotted in a "design configurator" on semantic differential scales to compare all three archetypes simultaneously (Fig. 3). The elements of the design dimensions D 5 role, D 13 front-end user interface, D 20 application domain, and D 21 motivation/purpose for chatbot use could not be ordered in a meaningful way and are therefore not represented in the figure.

Results
In line with our two research questions, our results are presented in two parts as well: First, the final "Design Taxonomy for Chatbots with Different Temporal Profiles", resulting from the taxonomy development and evaluation procedure is introduced (RQ1) and, second, the results from applying the taxonomy to 120 chatbots to analyze differences between short-, medium-and long-term chatbots (RQ2) are presented.

Part 1 & 2: Design taxonomy for chatbots with different temporal profiles
The final taxonomy (Table 2) provides chatbot designers and researchers with a framework of design dimensions and characteristics for chatbots with different temporal profiles (RQ1). In the following, we present all design dimensions and design characteristics following the structure of the five overarching design perspectives, which themselves can be differentiated with regards to whether they relate to the (i) chatbot, to the (ii) user-chatbot relationship, or to the (iii) user alone (Baraka et al., 2020, p. 3). The perspectives temporal profile, appearance, and intelligence relate to the chatbot, the interaction perspective reflects the user-chatbot relationship, and context relates to the user's circumstances and intentions to engage with the chatbot in the first place.

Temporal profile
The first overarching perspective, a chatbot's temporal profile, can be characterized by the D 1 time horizon of the user-chatbot relationship, the D 2 duration of (individual) interactions, the D 3 frequency, and the D 4 consecutiveness of interactions with the user.
The D 1 time horizon of a user-chatbot relationship can be either C 1,1 short-, C 1,2 medium-, C 1,3 long-term, or C 1,4 life-long (Baraka et al., 2020). Short-term relationships are characterized by only a single or few occasional interactions (e.g., self-diagnosis healthcare chatbots like BABYLON or GYANT). Medium-and long-term relationships always consist of multiple interactions over a certain period (Baraka et al., 2020, p. 29). A typical example for a medium-term chatbot is an educational chatbot that teaches a particular course's defined junk of content (e.g., CODE-MONKEY or BOOKBUDDY) over a defined period (e.g., one school semester). A typical long-term example is a chatbot that monitors a patients' weight-loss progress (e.g., WEIGHTMENTOR) for a sustained period. Life-long relations are different from long-term chatbots as they aim to offer companionship similar to a partner-or friendship that may persist through major changes in a person's life (Baraka et al., 2020, p. 30).
We furthermore included the design dimension D 4 consecutiveness of interactions to capture whether multiple interactions are C 4,1 unrelated or C 4,2 related as a chatbot-based service. Unrelated interactions are typical for chatbots that provide style recommendations based on the current product database (e.g., Levis's INDIGO) or chatbots that curate information, for example, about HIV (e.g., SHIHBOT). Related interactions are, for example, typical for a language teaching chatbot such as DUO-LINGO, which tutors multiple sequential units of a topic.

Table 3
Distribution of design characteristics per design dimension and temporal profile.

Intelligence
In contrast to the appearance perspective, the intelligence perspective entails all design dimensions that are characteristics of a chatbot's inner working mechanisms on which its functionalities are based. These include its D 8 intelligence framework, its D 9 intelligence quotient, and its capabilities to D 10 adapt its personality, to adequately and autonomously react D 11 socio-emotionally to user sentiments, and D 12 to integrate and process information from further services and sources such as C 12,2 additional external data or C 12,3 media resources. The differentiation of service integrations into C 12,3 additional media or C 12,2 external data highlights whether a chatbot is capable of broadcasting media such as video and pictures to users (e.g., SEPHORA KIK BOT) or of integrating and processing external data, for example, from a product database (e.g., 1-800-FLOWERS) or from users' devices (e.g., LARK). Furthermore, we attenuated whether D 11 socio-emotional behaviors were C 11,2 present or C 11,1 not and classified spontaneous empathic reactions as present (e.g., "that's great to hear" in reaction to a user who had indicated that she had slept well; e.g., YOUPER, BROOK).

Interaction
The interaction perspective comprises all design dimensions that are related to the interactions between the user and the chatbot. This includes the D 13 front-end user interface for which a chatbot has been developed, which allows a user to access (or not) a certain chatbot, for example, via C 13,2 social media platform messengers such as Facebook, via stand-alone C 13,1 apps, on C 13,4 websites, via C 13,3 communication and collaboration messenger platforms such as kik or Whatsapp,or C 13,5 combinations of these interfaces. D 14 Communication modality (i.e., C 14,1 text, C 14,2 speech or C 14,3 hybrid), D 15 interaction modality (i.e., C 15,1 graphical via quick response buttons only or C 15,2 interactive which allows for free text inputs) and D 16 user assistance design define the mode of operation of the userchatbot relationship. D 16 User assistance denotes the "locus of control" (Følstad et al., 2019) and indicates who is in charge of the conversational flow, meaning whether the chatbot only reacts to user inputs (C 16,1 reactive), whether it steers the conversation (C 16,2 proactively) or whether it is capable to alternate (C 16,3 reciprocal).
D 17 Personalization refers to a chatbot's capability to adapt a conversation based on previous interactions and inputs from a user. D 18 Additional human support refers to the possibility of human interventions that complement or accede the user-chatbot interaction (Kowatsch et al., 2017). D 19 Gamification specifies whether gamification elements such as quizzes are C 19,2 present or C 19,2 not.

Context
The context perspective gathers all design dimensions that refer to users' initial motives to engage with a chatbot. This is reflected in the D 20 application domain, whether a user seeks a D 22 goal-oriented collaboration (or not) and a user's primary D 21 motivation to engage with a chatbot in the first place.

Distribution of design characteristics per temporal profile
Comparing the distribution of design characteristics per design dimension, χ 2 -Tests or FETs respectively revealed significant differences between short-, medium-, and long-term chatbots (RQ2) for 19 out of 22 design dimensions (cf. Table 3). There were no significant differences in the distribution of design characteristics per design dimension between the different temporal profiles for D 29 (24.2%) as long-term chatbots. The only chatbot in our sample that could potentially be classified as a life-long chatbot (i.e., REPLIKA) was here classified as long-term as it still is not mature enough to answer the claim of life-long companionship.
Regarding the D 2 frequency of interactions, most short-term chatbots (71.2%) provided one-time-only interactions. Still, a proportion of 28.8% was classified as offering multiple occasions for interactions. For example, the chatbot GYANT is a symptom-checking and medical screening chatbot that can be consulted multiple times; however, each interaction with the chatbot starts as an independent conversation. Contrarily, medium-and long-term chatbots are exclusively (100%) characterized by multiple interactions, χ 2 (120, 2) = 63.21, p < .001, V = 0.726.

Appearance.
Fisher's Exact tests show that the design dimensions D 5 role and D 6 primary communication style depend on a chatbot's temporal profile: While the majority of chatbots are equipped with a C 5,2 facilitator role (61.7% of all short-term, 64.0% of all mediumterm and 44.8% of all long-term chatbots), 44.8% of all long-term are personified as C 5,3 peer characters, compared to 20.0% of all mediumterm and only 3.0% of all short-term chatbots. Eventually, 21.7% of all short-term chatbots exhibit an C 5,1 expert role compared to 16.0% of all medium-term and 10.3% of all long-term chatbots, FET = 25.36, p < .001, V = 0.334.

Intelligence.
Concerning the intelligence layer, D 8 intelligence framework, D 9 intelligence quotient, D 10 personality adaptability, and D 11 socio-emotional behavior are significantly associated with the different temporal profiles of chatbots. While the majority of chatbots either relied on purely C 8,1 rule-based text generation (49.2%) or on a C 8,2 hybrid solution generally following a rule-based conversational path but integrating some natural language processing capabilities to learn from the conversation (48.3%), only three (10.3% of all) long-term chatbots (i.e., REPLIKA, BRAINBOT, VIRTUAL IMAGINARY INTERLOCUTOR) could be classified as purely C 8,3 artificially intelligent chatbots, FET = 8.37, p = .049, V = 0.219. The same distribution is reflected in chatbots' D 9 intelligence quotient that differed significantly across temporal profiles, FET = 8.94, p = .040, V = 0.199: Overall, 36.7% of all chatbots could be classified as possessing C 9,1 rule-based knowledge only, 60.0% possessed some more or less basic C 9,2 text understanding capabilities and one medium-and three long-term chatbots (3.3% of all chatbots) could process information from other sources than text, for example, from images of an injury (i.e., MBOT).
While only seven (5.8%) out of all 120 chatbots are capable of adapting their personality to the user, all of them were either mediumterm (1) or long-term (6) chatbots, FET = 13.18, p = .001, V = 0.364. Accordingly, D 11 socio-emotional behaviors were only present in 45.5% of all short-term chatbots and in 56.0% of all medium-term, but in 79.3% of all long-term chatbots, FET = 8.96, p = .010, V = 0.274.

Interaction.
In the interaction layer, Fisher's Exact tests show significantly different distributions of design characteristics with regards to the temporal profile of the chatbots for six of seven design dimensions: D 13 front-end user interface, D 14 communication modality, D 16 user assistance design, D 17 personalization, D 18 additional human support, and D 19 gamification.
A majority of all chatbots in the total sample are deployed either as pop-up windows on C 13,4 websites (37.5%) or as artificial contacts in C 13,2 social media messengers (28.3%). There were significant differences between chatbots' temporal profiles, FET = 39.86, p < .001, V = 0.410. While 59.1% of all short-term chatbots were accessible via C 13,4 websites and only 9.1% via standalone C 13,1 applications, 44.8% of all long-term chatbots were only accessible via a standalone C 13,1 application, and two (6.9%) only on websites (i.e., MONDLY and KIM).
While we only included text-based chatbots in our sample, some of them (15.0%) also allowed for voice input or offered voice output (C 14,2 text + voice). Differences between chatbots with different temporal profiles were significant, FET = 7.01, p = .023, V = 0.241: 7.6% of all short-term, 20.0% of all medium-term, and 27.6% of all long-term chatbots could not only communicate via text messages but also via voice.
Similarly, chatbots are different with regards to their capability to adapt their content based on what they (already) know about the user and thus to personalize the conversation, χ 2 (120, 2) = 34.35, p < .001, V = 0.535: While 86.2% of all long-term chatbots could be characterized as C 17,2 adaptive, only 22.47% of all short-term chatbots personalize the conversation. Again, medium-term chatbots were more evenly distributed: 44.0% can be characterized as C 17,1 static, 56.0% as C 17,2 adaptive. Furthermore, 25.0% of all chatbots offered the possibility to connect the user with a human (live) agent with significant differences between the temporal profiles, χ 2 (120, 2) = 7.82, p = .023, V = 0.255: While 23 (34.8%) of all short-term chatbots were wired to a human agent, this was only true for four (16.0% of all) medium-term (i.e., @DAWEBOT, DROP-SHIPPING ASSISTANT, STINA, and ANA COPA AIRLINES) and three (10.3% of all) long-term chatbots (i.e., BROOK, WYSA, and ANNA/LUKAS).
Overall, the distributions of short-and long-term chatbots in business and healthcare in our total sample are largely in line with previous research (Tudor Car et al., 2020); for daily life and educational chatbots, we could not find similar analyses.
While the presence of a D 22 collaboration goal was not associated with the different temporal profiles, chatbots' D 21 usage motivation/ purpose differed significantly across the temporal profiles, FET = 72.51, p < .001, V = 0.531: C 21,5 coaching and supervision was the dominant motivation for using medium-or long-term chatbots (75.9% of all longterm, 40.0% of all medium-term and none of the short-term chatbots were classified as such); C 21,3 utility is the dominant motivation for using short-term chatbots (53.0%), followed by C 21,4 informational (28.8%).

Time-dependent chatbot archetypes
Comparing the calculated indices per design dimension and temporal profile revealed a consistent pattern of differences between short-, medium-, and long-term chatbots (RQ2). Since the design characteristics were always ordered hierarchically from none/low/less to high/more the design of long-term chatbots can be characterized as more advanced, complex, and sophisticated than medium-or short-term chatbots in all design dimensions except for the availability of D 18 additional human support which was more frequently present in short-and medium-term chatbots.
The results are depicted in Fig. 3. Since the design characteristics are mutually exclusive, the visualization in Fig. 3 allows emphasizing fundamental design differences between the different temporal profiles, while accounting for the fact that the archetypes ought not to be understood as separate, dichotomous entities that cannot share common characteristics or tendencies. The differences are subsequently conceptualized into three time-dependent archetypes.

Ad-hoc Supporters.
Short-term chatbots designed for short, isolated, one-time-only interactions are denominated as Ad-hoc Supporters. With regards to their level of development, they are often based on less advanced and less complex technologies. Furthermore, they are usually not designed to offer services beyond the services that the website on which they are typically implemented already provides. Thus, Ad-hoc Supporters are generally not intended to replace but to complement a company's existing communication channels. Ad-hoc Supporters are furthermore strongly task-oriented, which is not only reflected in their primary communication style but also in the fact that they neither (need to) possess the ability to adapt their personality nor other aspects of the conversation to prior interactions with the user. Furthermore, to ensure that the underlying problems for which users primarily approach them (mostly informational and utilitarian reasons) are solved efficiently, they tend to connect users more quickly with a human agent than chatbots with other temporal profiles.

Persistent Companions.
Long-term chatbots, which are designed for longer, interdependent, and perpetual interactions, are denominated as Persistent Companions. To meet expectations that come along with long-term oriented relationships, they appear more advanced and flexible along multiple dimensions: They are designed in a way that allows users to steer a conversation in a certain direction, but they are also capable of proposing new directions proactively, for example, when a conversation is stuck. Overall, Persistent Companions appear to possess characteristics that support relationship-building processes with users: Their socially-oriented communication style often allows for social talk and chitchat besides the primary objectives of the conversation. Furthermore, they are intended to adapt their personality in the course of the relationship-building process and to personalize conversations based on what they have learned about a user's profile in prior interactions. To further increase the variability of the interaction and to account for people's primary motivation to engage with Persistent Companions (i.e., entertainment and coaching), Persistent Companions are also more likely to integrate gamification elements.
To endow Persistent Companions with the necessary technical capabilities, they tend to be developed as stand-alone applications, which allows implementing a variety of functionalities beyond the features that platform-dependent specifications dictate. Thereby, they can integrate additional services and embed information from external databases or process information retrieved from media elements. Overall, Persistent Companions are likely aimed at (partly) replacing or superseding existing offerings.

Temporary Advisors.
Chatbots for medium-term relationships are here denominated as Temporary Advisors. While they meander between short-and long-term chatbots with regards to their design characteristics, their temporal profile is more comparable to long-term than short-term chatbots as they are rather designed for multiple, (partly) associated medium-length interactions. However, similar to short-term chatbots, they are less likely to adapt their personality and rely on less advanced technologies. More similar to long-term chatbots again, they refrain from providing instant contact to a human agent and are instead more likely to integrate additional services, and features (such as processing of external data and gamification elements) to help solve a user's cause to approach the chatbot in the first place.

General discussion
Since chatbots are becoming increasingly prevalent across all industries, managers' success increasingly depends on their ability to adapt a chatbot's design to the conditions it is developed for, which includes, in particular, for how long users intend to interact with the chatbot.
In order to answer our first research question (RQ1: "Which design elements allow us to distinguish chatbots depending on whether they are aimed to help individuals to achieve short-, medium-or long-term goals?"), following an established taxonomy development method, we developed a design taxonomy to characterize user-chatbot relationships with different time horizons comprised of 22 design dimensions and 61 design characteristics organized into five overarching perspectives that are visible or experiential within the user-chatbot relationship. In order to answer our second research question (RQ2: "How does a chatbot's temporal profile affect its design?"), we classified and analyzed 120 chatbots with regards to their distribution of design characteristics per design dimension. Frequency analysis revealed significant differences between the chatbots depending on whether they are aimed to help individuals to achieve short-, medium-or long-term goals for 19 out of 22 design dimensions. Further analyses allowed us to derive, distinguish, and conceptualize three distinct time-dependent chatbot archetypes (i.e., Ad-hoc Supporters, Temporary Advisors, and Persistent Companions) that allow scientists and practitioners alike to understand, study, and take into account design particularities inherent in the time horizon of the user-chatbot relationship.

Theoretical contribution
Even though researchers from different fields acknowledged the importance of temporal design aspects for users' engagement (Baraka et al., 2020;Hildebrandt et al., 2004Hildebrandt et al., , pp. 1737Hildebrandt et al., -1738Karahasanović et al., 2019, pp. 376-390) and their individual "trajectories of interaction" (Benford et al., 2009, p. 109), time-dependent design aspects of chatbots have not been investigated systematically so far. Thereto, our research offers four main theoretical contributions: First, so far, research that took into account a chatbot's temporal profile as a determining design factor had been "one-sided" and predominantly focused on investigating design factors that drive users' engagement with long-term chatbots (Bickmore & Picard, 2005;Hobert & Berens, 2020) -without ever questioning the transferability of possibly successful design factors to chatbots that were developed for short-term relationships. Thus, similar to prior research that has provided classifications and frameworks for various specific foci such as healthcare chatbots (Laranjo et al., 2018), business-to-business chatbots , or collaborative team chatbots (Bittner et al., 2019;Seering et al., 2019), this work takes up on prior calls for research to consider time as an important factor in user-chatbot relationships and provides a holistic perspective on chatbots' different temporal profiles. The development and provision of a comprehensive taxonomy of time-dependent design elements for chatbots with different temporal profiles (RQ1) enables researchers and practitioners to compare fundamental design differences between chatbots for short-, medium-, or long-term purposes.
Second, by providing insights about the impact of a chatbot's temporal profile on its design (cf. RQ2), the derivation and differentation of three time-dependent chatbot archetypes allows researchers to extend the conceptual understanding of chatbots' social roles in user-chatbot encounters (Scarpellini & Lim, 2020). Apparently, chatbots supporting short-term goals rather aim at assisting and complementing human actors and work as receptionists that connect users with a human agent to make sure customer problems are solved quickly. Their task-oriented communication style reflects their aspiration for efficiency and the minimization of "cost, effort, and time allocated to the interaction" (Verhagen et al., 2014, p. 534), fulfilling a social role similar to a supportive "assistant". Quiet contrarily, chatbots that are developed to accompany users over longer periods are often developed in greater depth and with more complexity to be ultimately capable of working independently from any human agent (De Keyser et al., 2019). The comparatively more pronounced manifestations of characteristics that anthropomorphize and personalize the interaction, such as the integration of socio-emotional behaviors, the socially-oriented communication style, the inclusion of gamification elements, and the adaptation of the chatbot's character in the course of the user-chatbot relationship, are all indicators thatfor chatbots that want to help individuals achieve long-term goals -"greater emphasis is put on the feeling of solving a problem together, being more [responsive] to personal needs and enhancing social contagion" (Verhagen et al., 2014, p. 535). While some of the aforementioned design manifestations likely also depend on other factors, for instance, the functional purpose for which a chatbot has been developed (Scarpellini & Lim, 2020), our analysis offers evidence that the temporal profile is a decisive factor as well and strengthens the notion of chatbots' dual role as a communication medium and, in their role as social actors, as communication partners.
Third, with the development of a chatbot taxonomy, we not only contribute to the chatbot research community with new knowledge but also methodologically to the taxonomy development field. We not only used the ending conditions as suggested by Nickerson et al. (2013) to evaluate the taxonomy's comprehensiveness (Iteration 6) but also asked two previously uninvolved researchers in iteration 7 to classify a new set of chatbots using the developed taxonomy. Furthermore, one of the researchers involved in the development process hitherto classified this new set as well which allowed us to show that external researchers can apply the taxonomy correctly and that the taxonomy is applicable to a new dataset (cf. Table A5, Web Appendix). Moreover, analogous to the use of interview guides in other qualitative research, our research is the first to classify real-world chatbots from a temporal-based perspective through chat logs obtained from standardized, semi-structured dialogue guides (cf. Table A6, Web Appendix), which should become a standard for the analysis of real-world chatbots to ensure comparability.
Lastly, we demonstrate that a taxonomy can be used to test differences with regards to a specific, predefined design characteristic (i.e., temporal profile) instead of using cluster analyses to discover latent archetypes within a dataset as in other taxonomy-based research (e.g., Diederich et al., 2019) and suggest a novel visual representation of the design configurations for the differentiated archetypes (Fig. 3). In this vein, our approach serves as a guideline for future research that strives to recognize structures with a focus on a certain superordinate feature.

Managerial implications
Even though our analysis is based on 120 real-life chatbots and, thus, based on past design choices of practitioners and chatbot developers, a systematic understanding of time-dependent aspects in the design of chatbots was missing. Therefore, the present research offers actionable guidelines and a salient framework that can guide practitioners from the first day in designing, developing, and implementing a chatbot with a specific time horizon.
First, our definitions of three time-dependent chatbot archetypes offer practitioners an explicit representation of the time horizon as a determining factor for a chatbot's design. Having a common understanding and definition of a chatbot's temporal profile and being aware of this factor helps to prevent communication problems within companies between product managers and developers.
Second, the taxonomy of time-dependent design elements together with the conceptualization of three chatbot archetypes lay a solid foundation for streamlining the design process of structures and architectures of domain-specific chatbots when the intended temporal profile of the user-chatbot relationship is clear, which in turn reduces designers' efforts, cost, and time to develop and implement new chatbot-based services.
Third, while each of the design dimensions entails challenges and opportunities, the taxonomy gives designers the flexibility to add and combine dimensions to prototype and tailor the chatbot development quickly to any desired target group or use case while taking into account boundary conditions and restrictions (e.g., available budget or development expertise).

Limitations, further research, and concluding remarks
As with any research, this work has some limitations, which offer opportunities for future research directions (RDs).
While having thoroughly followed an established taxonomy development procedure protocol (Nickerson et al., 2013), the limitations of this study mainly stem from the subjective choices inherent in any qualitative research approach. This subjectivity, for example, may underlie to a certain extent in the construction of our sample(s). While the final taxonomy relies on the classification of 120 chatbots from three different samples and sampling strategies which ensures wide coverage of available chatbots, further analyzes of archetypes could investigate boundary conditions that relate to interactions with situational design factors (RD1) other than the temporal profile, for example, across different domains (Feine et al., 2020) or purposes (Scarpellini & Lim, 2020). Notwithstanding, we applied a systematic empirical evaluation process to analyze the final sample under the same structure and attributes, assuring homogeneity in quality and data format. Likewise, we maintained a consistent unit of analysis throughout our research relying on the same aforementioned sample to develop the taxonomy and to analyze differences between the three identified temporal profiles.
Furthermore, while our research offers insights into a time-aware design of chatbot-based services, we have purposefully limited the scope of our study to domain-specific, text-based chatbots. Yet, it needs to be investigated, to which extent our taxonomy can be used to inform the design of other types of conversational agents, such as generalknowledge and/or voice-based conversational agents (RD2). Voice assistants, such as ALEXA, CORTANA, or SIRI often accompany users over longer periods (Knote, Janson, Söllner, & Leimeister, 2019), yet, at the same time, they can usually be equipped with so-called "skills" that support individuals' short-term goals. Such skills are often provided by third-party developers, and it needs to be well understood how they can be best integrated into an existing relationship with the voice assistant (RD3). Similarly, many chatbots that are overall aimed to help individuals achieve a specific long-term goal (e.g., losing weight), are simultaneously also designed to help individuals achieve short-term goals (e.g., to reach a certain number of daily steps). Therefore, differentiating subtypes of long-term chatbots likely renders different design choices necessary (RD4).
Although our research does not attempt to assess the success (e.g., user engagement, satisfaction, retention) of the current state-of-the-art configurations of the different identified temporal chatbot archetypes, the technologies that are enabling and driving chatbots' capabilities are advancing quickly. Since "taxonomies are not static but change over time as new objects that may or may not fit into an existing taxonomy are developed or identified" (Nickerson et al., 2013, p. 355), these issues can be addressed in further research projects by re-iterating the taxonomy development procedure from time-to-time (RD5) and by connecting the identified design elements to specific success factors (RD6) to avoid upcoming gaps between theory and practice.
In conclusion, the present paper demonstrates that the relationship duration is a central factor in the design of chatbots and offers new directions for investigating nuances of engaging time-dependent chatbot design. In this spirit, this work strives to serve as a foundation for further researchers undertaking design-related research projects that ultimately enable the optimization of the development of chatbots.