The influence of conversational agent embodiment and The influence of conversational agent embodiment and conversational relevance on socially desirable responding conversational relevance on socially desirable responding

Conversational agents (CAs) are becoming an increasingly common component in a wide range of information systems. A great deal of research to date has focused on enhancing traits that make CAs more humanlike. However, few studies have examined the in ﬂ uence such traits have on information disclosure. This research builds on self-disclosure, social desirability, and social presence theories to explain how CA anthropomorphism a ﬀ ects disclosure of personally sensitive information. Taken together, these theories suggest that as CAs become more humanlike, the social desirability of user responses will increase. In this study, we use a laboratory ex-periment to examine the in ﬂ uence of two elements of CA design — conversational relevance and embodiment — on the answers people give in response to sensitive and non-sensitive questions. We compare the responses given to various CAs to those given in a face-to-face interview and an online survey. The results show that for sensitive questions, CAs with better conversational abilities elicit more socially desirable responses from participants, with a less signi ﬁ cant e ﬀ ect found for embodiment. These results suggest that for applications where eliciting honest answers to sensitive questions is important, CAs that are “ better ” in terms of humanlike realism may not be better for eliciting truthful responses to sensitive questions.


Introduction
Advances in technology since the mid-1990s have ushered in a new age of communication where many face-to-face (FtF) interactions have been replaced by interactions between humans and computers. These interactions may be in the form of computer mediated communication between two or more humans, or in the form of human-computer interactions, in which the computer is the ultimate communication partner. While many human-computer interactions remain clearly in the domain of a human interacting with a computer using conventional methods and norms (i.e., using the keyboard or mouse to perform specific tasks), an emerging area of interest is the replacement of human agents with conversational agents (CAs)-systems that mimic human-to-human communication using natural language processing, machine learning, and/or artificial intelligence [1].
The idea of interacting with a computer as if it were another human has fascinated users and developers of information systems for many years. Early implementations of CAs were novelties designed to play specific roles such as the Rogerian psychotherapist ELIZA [2], and PARRY-a paranoid patient [3]. As technological capabilities have advanced, these "toy" CAs have given way to the emergence of sophisticated and generalizable frameworks that parse user responses and mimic understanding by responding to pre-defined phrases or keywords (e.g., A.L.I.C.E. [4] and ChatScript [5]) [6]. These and other similar platforms have recently ignited a substantial increase in the popularity of CAs and many popular instant messaging and social media platforms, such as Facebook Messenger, WhatsApp, and Kik, have integrated tools to develop and deploy CAs. These efforts have been met with enthusiastic response from users. For example, in the year following the introduction of its bot integration platform in 2016, Facebook Messenger saw the introduction of over 34,000 conversation agents, or "bots" [7].
This increase in pervasiveness and utility has resulted in CAs taking on more serious roles such as serving as virtual personal assistants [8], conducting medical interviews [9,10], providing therapy for depression and anxiety [11], disseminating emergency response information [12], and conducting interviews to detect fraud and deception [13,14]. In many of these scenarios, the information being solicited may be considered sensitive and individuals may be unwilling or hesitant to disclose the information-not necessarily for nefarious reasons, but rather to avoid providing answers society would deem unacceptable or confessing undesirable behavior [15]. Because of the wide variety of contexts in which CAs operate, understanding how specific design choices influence user perceptions and behaviors is an important topic of study.
While prior research has thoroughly explored the mechanics of using CAs to conduct interviews and how to make CAs more humanlike, only recently has attention been paid to how design decisions may impact how comfortable users are disclosing potentially sensitive information to a CA [16]. It has been suggested that CAs that are perceived as more humanlike may have the unintended consequence of increasing discomfort in users [17,18]. As emerging applications are using CAs to elicit sensitive information from users-for example, in a medical office performing the interviewing duties of an intake nurse [9]-it is important to understand the effect more humanlike CAs have on information disclosure. The way a question is asked, and who is doing the asking, can have strong effects on the truthfulness of answers given [19,20]. Thus, such design decisions are critical when sensitive personal information must be elicited.
In pursuit of empirically studying the effect of making a CA more humanlike on disclosure of sensitive information, this paper builds on self-disclosure, social desirability, and social presence research. We examine how people adapt the social desirability of their answers in response to the social presence of a CA interviewer, compared to an online survey and a face-to-face interview. The following research question guides this work: How do the conversational capabilities and embodiment of a CA influence disclosure of sensitive information?

Social presence
The influence of humanlike characteristics, such as the capability to hold a conversation and representative embodiment, is explained by social presence-the sense of connection that a user feels with their communication partner [21]. Social presence is frequently manipulated via attributes of the communication medium, such as its richness [22,23]. As a communication medium allows for richer content, the media evokes a greater sense of social presence compared to less rich media [24] and can give additional context to communication [25], thus increasing social presence. In addition to the richness of the medium, the way in which the medium is used and the information conveyed-i.e., the conversational capability of one's partner-also influence perceptions of social presence [26,27].
Given our understanding of how users perceive computers as social actors [28], the influence of social presence on disclosure should apply whether the conversation partner is a human or a computer. Prior research has found that people often treat computer systems as if they were human [1,29]-for example, by applying politeness norms [30], reciprocating self-disclosure [31], and expressing a feeling of connection [32]. In the case of information disclosure, social presence could have either positive or negative effects. On the positive side, social presence can increase trust [33], potentially making people feel more comfortable disclosing. Conversely, greater social presence can also result in negative outcomes as people consider the social desirability of their responses and how their responses might influence their communication partner's opinion of them [34]. We suggest that in an interview situation, particularly one in which sensitive information is being elicited, a greater sense of social presence will evoke more socially desirable responses, in which people are more likely to adjust their responses to match what they think the socially desirable response is.
2.2. Self-disclosure and social desirability For many emerging applications, a core component of enhancing the usefulness of the system is encouraging users to provide information about themselves to the system. When soliciting sensitive information, the effects of attributes of the interviewer on self-disclosure and social desirability must be considered. Self-disclosure is the extent to which individuals share information about themselves purposely and voluntarily [35,36]. Information being disclosed about oneself may present the discloser in a positive, negative, or neutral way, and questions may or may not be viewed by the discloser as being sensitive [37]. With this in mind, a respondent may choose to disclose more or less information-or not disclose any information at all-based on the nature of the interaction.
In addition to deciding how much information to disclose, people may also modify their response to questions to increase the social desirability of their response. Social desirability describes the way in which people would like to be seen by others [38]. Modifying responses to be more socially desirable may stem from a desire to improve social status, or to avoid negative consequences. When people are asked to disclose socially undesirable information about themselves social desirability bias can have a strong effect on reporting [39]. Prior research has found the level of social presence in the way questions are administered can result in important differences in responses. Interactions with lower social presence, such as computeradministered surveys, have been found to result in responses that are less biased by social desirability than those in face-to-face interviews [20]. The effect of social desirability in survey responses has been studied extensively, as it presents a serious threat to the validity of survey measures [19,40,41]. Techniques such as indirect questioning [42] and self-and computer-administration of surveys [43], as opposed to human interviewing, are often used to mitigate the effects of social desirability. In line with these findings, we expect that respondents will vary the social desirability of their response in accord with the social presence of the interview format. We hypothesize that in the format with the highest social presence (face-to-face) the amount of information disclosed will be the least, and as social presence is reduced-from face-to-face to interaction with a CA, and finally to a non-interactive survey-the level of disclosure will increase. Thus we propose H1: H1. Interview modalities with higher social presence lead to more socially desirable responding.
When asking interview questions, one important consideration is the sensitivity of the questions being asked, as sensitive questions are more likely to be influenced by social desirability than non-sensitive questions [44]. Among the general population, questions about topics such as medical history, sexual history, and drug/alcohol use are typically considered sensitive [37]. Sensitive questions may result in either nonresponse or high measurement error compared to non-sensitive questions [45], and may elicit less truthful responses as answering them truthfully may cause negative consequences such as shame or punishment [46]. While the aforementioned topics are generally considered to be sensitive, the sensitivity of specific questions is dependent on the individual being asked the question, the asker of the question, and the social acceptability of the topic [45]. Sensitivity can be measured through nonresponse on survey items, or through separate ratings from people indicating their willingness to answer truthfully [47,48].
Since sensitivity and social desirability depend on both the individual and the context, the same question may be of different levels of sensitivity and social desirability for different people, or even for the same person in different circumstances, thus leading to different levels of disclosure [45]. For example, individuals who are under the legal age to consume alcohol tend to overestimate drinking behaviors of their peers, potentially increasing the perceived desirability of this behavior within that group [49]. Therefore, if a person that is under the legal age to drink alcohol is asked about drinking behavior by a peer, the question might be considered to be of low sensitivity, and it might be perceived that a higher answer would be viewed as more socially desirable. Thus, the respondent would be willing to disclose, and perhaps even inflate, their drinking behavior to improve the social desirability of their response. However, if an authority figure, such as a parent or teacher, asks the same underage individual about drinking, the question may be deemed sensitive and of negative valence, thus leading the respondent to hide or underreport drinking to avoid punishment [50]. Interview modality has been found to be an influential factor in determining how honest people will be when sharing information about sensitive questions [44,48], thus we present the following hypothesis regarding the moderating effect of question sensitivity: H2. The influence of interview modality on socially desirable responding is stronger for sensitive than for nonsensitive questions.

CA characteristics
Previous studies have used CAs during survey administration to investigate how CAs in general affect disclosure [19,51]. However, the scope of these investigations has been limited to comparing CAs with other forms of information gathering. In the current work, we explore in greater depth two particular anthropomorphic traits: conversational relevance and embodiment.
Conversational relevance is present when the response to a message is related to the current topic of conversation [52,53]. In a CA, relevance is driven by the ability of the agent to provide the appearance of understanding the user's input by responding in a contingent manner. To illustrate, consider a CA that asks a user the question, "What is your favorite movie?" There are a multitude of responses the user could provide. A conversationally non-relevant CA will provide a generic response regardless of the input provided by the user, while a conversationally relevant CA will parse the user's message and give a response that is related to the content. For example, if the user responds with "Saving Private Ryan," the CA might respond with "I don't watch many war movies." If the user responds with "The Notebook," the CA might reply "Oh, I love Nicholas Sparks movies!" This type of contingent reply can give the impression that the CA understood the input, thus mimicking human-to-human conversation and creating a more humanlike conversation. A CA that does not give conversationally relevant responses, on the other hand, would give the same response-for example, "That sounds like a nice movie"-regardless of the user's input.
Non-relevant responses give the impression that the conversational partner is disconnected from the user, while relevant responses increase the sense of social presence. CAs that communicate well by providing relevant responses are perceived by the user to understand their answers [48]. While this capability has benefits in many interactions-for example, the system may be more useful or more enjoyable to use-computer systems that seem more humanlike might also negate some benefits of computer-based interviews, such as the mitigation of social desirability bias [43], as interviewees may perceive the system to be judging their responses [48]. This is particularly prevalent in socialphobic patients-those who fear interacting with and being evaluated by other people-where more humanlike CAs have been found to increase anxiety [54]. Accordingly, we expect to see an increase in socially desirable responding because of the increased social presence of a conversationally relevant agent. Therefore, we hypothesize: H3. Increasing CA conversational relevance increases participants' socially desirable responding.
Social presence may also be manipulated through embodiment-the visual representation of an agent [55]. Research on embodiment effects on social presence have often used avatars-digital representations of a human-compared to no visual representation of the communicator [56]. When a CA is given a facial representation, it makes the CA appear more humanlike [57] and increases the naturalness of the communication [58]. Prior research suggests that the mere presence of a face in human-or computer-administered surveys can create pressure to respond in socially desirable ways. For example, Lind et al. [19] showed a strong effect of facial representation on socially desirable responses to surveys, with people showing more socially desirable responses when responding to a survey with an embedded image of a face than with text alone. The counterpart to this phenomenon is also evident in many human-to-human interactions such as the confessional booth or a psychoanalyst's couch, both configurations in which the interviewer's face is hidden from the discloser to encourage more candid responses. Therefore, we hypothesize: H4. Visual embodiment of a CA increases socially desirable responding.
As in the case with interview modality-as described in H2-we suggest that question sensitivity will affect the socially desirable responding we observe due to the CA's varying levels of conversational relevance and embodiment. When sensitive questions are asked, socially desirable responding is expected to be greater than when the agent is asking about nonsensitive topics. Thus we present our final hypothesis: H5. The effects of (a) conversational relevance and (b) embodiment on socially desirable responding is stronger for sensitive questions than for nonsensitive questions.

Identifying sensitive questions
To identify topics of varying levels of sensitivity, we first created a list of potential interview questions identified as sensitive or non-sensitive topics by prior research [19]. As part of a separate data collection from the same population as the main study, we asked 138 students to rate from 1 to 6 how comfortable they would feel truthfully answering specific questions about each topic. Among the topics considered, the largest difference in sensitivity was between health and drinking behaviors. Alcohol use is a particularly sensitive topic for college student populations, as alcohol use and abuse on college campuses are salient and controversial topics [59]. Two corresponding questions from each topic were chosen to represent these topics (see Table 1). Because the data were skewed, we used a paired Wilcox signed-rank test to evaluate the differences in sensitivity between the two topics. Health (mean sensitivity = 5.16) and drinking (mean sensitivity = 4.67) behavior were found to be statistically different (n = 138, V = 2070.5, p < 0.001). For the population used for our study, drinking is considered relatively high sensitivity and health behavior is considered low sensitivity.

Main study design
To test our hypotheses, participants were randomly assigned into one of six experimental conditions: a face-to-face interview, an online survey, or one of four interactions with a CA. For the CA interactions, the conversational relevance and embodiment conditions were randomly assigned in a 2 × 2 subgroup (see Table 2). One hundred and

Conversational agent development
We used the ChatScript engine [5] to create the interviewing CA for this study. To simplify development of the chat corpus, we chose conversation topics relevant to the subject pool. Since the participants were college students, the CA asked about their major, classes, and recreational activities. Based on these topics, a corpus of patterns and anticipated answers to questions was created. For example, on the topic of majors, if the participant reported computer science as their major, the CA would respond with a message such as, "That's cool, I love technology." Using this initial conversation corpus, we conducted a pilot test to identify potential responses for which matching patterns did not exist. When non-matching patterns were identified, responses were created and added to the corpus. While it is infeasible to match every possible response a user might give, due to the limited scope of the conversation topics we were able to create responses for the majority of inputs given by participants. The interface used for the conversational agent conditions included a few other features to make the conversation feel more like a normal chat conversation. For example, responses from the CA were delayed slightly based on the length of the message to create the illusion the CA was composing a response. During this delay a bouncing dot "waiting indicator" was displayed as is common in many chat applications.

Procedure
All participants completed an online pre-experiment survey of demographic information before registering for a time to participate in the experiment. Condition randomization was performed after completing the survey. All participants who reported for their assigned experiment time participated in either a face-to-face interview with a human, an online interview with a chatbot, or answered the interview questions via online survey software. The same interview questions (Table 1) were asked in each of the conditions. Following the interview, participants were directed to a post-experiment survey. Data from the post survey relates to other projects outside the scope of this research, however, as previously described, some of the questions in the post survey were used to determine if the participant was attentive and providing valid data during the experiment.

Face-to-face interview
In the face-to-face interview condition, participants reported to a nondescript room where they were directed to sit across from their interviewer. There was a single interviewer-a 34-year-old Caucasian male dressed in professional attire-for all participants. The interviewer was instructed to only ask the questions defined in his script and to maintain a neutral demeanor, minimizing any verbal or non-verbal responses to the communication. Responses were recorded by the interviewer on a paper form and entered into a computer system at the conclusion of the study. Following the interview, participants were directed to a computer in a different room to complete the post-experiment survey.

Online survey
Participants assigned to the online survey condition reported to a computer lab containing 30 workstations, each equipped with a privacy screen. Each computer was configured with a full-screen web browser displaying a survey containing the same set of questions asked in the face-to-face interview. After being seated, participants completed the interview survey, followed by the post-experiment survey.

Conversational agent
Like the survey condition, participants in the CA condition were directed to report to the aforementioned computer lab. Within the CA condition, a nested 2 (conversationally relevant vs non-relevant) × 2 (embodied vs. unembodied) between-subjects experimental design was used to test the hypotheses involving the CA. Each CA used the same number of utterances in both interviews so that users were presented with the same number of questions. In the unembodied condition, the chat took place without a visual avatar. In the embodied avatar condition, participants interacted with a CA that had an animated face (see Fig. 1).
Each CA interview began with basic rapport-building questions [51] to establish a sense of social presence, or a lack thereof. These questions include general introductory questions such as "What class are you here for?" and "What is your favorite outdoor activity?" It is during these introductory questions that the differences between the relevant and non-relevant CAs were introduced. The non-relevant CA gave generic follow-up questions to each response. For example, for the question about outdoor activities, the non-responsive CA followed up with "What else do you enjoy doing?" The relevant CA, on the other hand, gave different responses based on the participant's response. If the participant responded with "swimming," the CA would follow up with "Water sports are fun. How often do you go?" Similarly, if the participant said "hiking," the CA responded with "I've wanted to try hiking for a while now. When did you start?" A wide variety of responses were matched in this way to create a conversational tone for the interview. After the rapport-building questions, the CA asked the previously described interview questions (see Table 1). There was no difference between the relevant and non-relevant conditions after the initial rapport building conversation, including during the interview questions. Fig. 2 shows a side-by-side comparison of relevant and nonrelevant conversations. During the rapport-building segment, questions 1 and 3 would be the exact same question for all participants regardless of experimental condition. In the nonrelevant condition (right side of Fig. 2), questions 2 and 4 would also be the same regardless of what the participants responded to the questions. In the relevant condition (left side of Fig. 2), questions 2 and 4 are related to the user's response to questions 1 and 3, respectively.
To preclude the possibility of contamination due to a participant in the unembodied condition seeing the embodied CA on a nearby computer screen, each session was randomly assigned to have either embodied or unembodied CA conditions. Participants were randomly assigned within a session to either the relevant or non-relevant CA, as there is no obvious visual difference between them.

Table 2
Breakdown of conditions.

Analysis
The data were prepared for analysis by standardizing the responses for each question. The topic responses were then averaged for each participant, thereby creating composite values for the drinking and health disclosure measures. Since we do not have ground truth of the participants' drinking and health behaviors-which would require observation of their actual drinking and health behaviors over time-a general assumption that we make throughout the analyses is that, due to the random assignment of participants to conditions, the average values for drinking and health behaviors are not systematically different across conditions. Therefore, any significant differences between the conditions are due to our manipulations rather than differences in actual drinking or health behaviors. While it is not possible to identify how truthful any individual's responses are, we use group trends to estimate the effects of the manipulations on truthfulness overall. This follows the methodology used in previous research on socially desirable responding [60,61].
H1 predicted that there would be differences in the amount of disclosure between face-to-face interviews, online surveys, and CA interviews in terms of the amount of disclosure. H2 predicted that the effect of H1 will be stronger for high sensitivity questions than low sensitivity questions. For the analysis of H1 and H2, all CA conditions were grouped together. Before conducting the analysis, we tested for normality by measuring skewness and kurtosis. Drinks has a significant skewness (1.16) and a significant kurtosis (0.92). Health does not have a significant skewness (0.25) nor kurtosis (−0.52). Because the drinks measure was skewed, we used a Tobit model [62,63] for drinks and a generalized linear model for health to test whether the interview type affected disclosure. The models controlled for age and sex, two major factors that are known to be correlated with drinking behavior [64]. As illustrated in Table 3 and Fig. 3, the face-to-face condition interaction elicited more socially desirable responses when asking people about their drinking behavior (i.e., less drinking was reported). For the health questions, participants gave similar responses regardless of the interview type. There was no significant difference between the survey and aggregated CA conditions for either set of questions. Therefore, the results partially support H1 and fully support H2.
The remaining hypotheses pertain to the effects of CA conversational relevance (H3), embodiment (H4), and question sensitivity's   moderating effect on relevance and embodiment (H5). We tested the hypotheses using two generalized linear models-one for each topic. As before, we controlled for age and sex. H3 predicts that participants interacting with a CA that gives conversationally relevant responses will give more socially desirable answers than when interacting with a CA that does not. For the sensitive questions (drinking disclosure), the models show a statistically significant effect for conversational relevance-participants in the conversationally relevant condition reported less drinking than participants in the nonrelevant condition. The same does not hold true for the less sensitive health behavior questions (Table 4). Thus, H3 was partially supported, and H5a was supported. The results do not show a direct effect for embodiment (H4). The different in each condition is illustrated in Fig. 4. A summary of the hypothesis testing results is provided in Table 5. Our analysis used standardized composite measures of the interview responses in order to facilitate the necessary statistical analysis. However, the combination and standardization of the responses makes practical interpretation of the results difficult. Therefore, to more clearly illustrate the effects, we present the raw averages given in response to each question in Table 6. The data show that participants in the face-to-face interview reported about 30% fewer drinks in a typical week than those in the CA or survey conditions. Similarly, they reported 59% fewer days intoxicated than participants in the CA condition and 44% fewer than participants in the survey condition. That is, those in the face-to-face condition likely underreported their drinking in order to provide more socially desirable responses.
We also see interesting outcomes in the raw numbers reported for conversationally relevant vs. nonrelevant CAs. For the sensitive drinking questions, the unstandardized data show that those in the conversationally nonrelevant condition reported an average of 5. Neither of these differences is statistically significant at the 0.05 level. This is consistent with previous research showing that social desirability effects of question administration mode are stronger for undesirable rather than for desirable actions [46].

Discussion
For this study, we developed a web-based chat interface and a CA to interact with users. In our experiment design, participants either participated in a face-to-face interview, interacted with a CA, or completed a web-based survey. Four between-subject conditions were nested within the CA condition: the CA gave either relevant or nonrelevant responses, and either had or did not have a visual embodiment. Both the relevant and nonrelevant CAs asked the same initial questions, however, the nonrelevant CA gave little feedback and asked generic follow-up questions while the conversationally relevant CA responded   with questions relevant to the answer given by the participant. For all conditions there was a within-subject manipulation in that participants were asked questions that were either sensitive (alcohol consumption) or not sensitive (general health behavior). We tested the relationship of the interview modality on users' disclosure when responding to the system. The results show a significant difference between the chatbot and face-to-face interviews, with the human interviewer garnering responses that were higher in social desirability in response to sensitive questions. This difference is consistent with the social presence explanation, as the face-to-face interview with a human would have higher social presence than the computer-based interview with a CA, thus resulting in answers that are higher in social desirability. Contrary to our hypothesis, however, the embodiment manipulation had no significant effect on disclosure. We believe this may be explained by the complex nature of embodiment, which may be influenced by many factors including the quality of the animation, the perceived social status of the avatar, gender differences [13], demeanor of the avatar, similarity to the participant [65], and more. Our embodiment manipulation was a looping animation of a face that did not respond to user messages or provide any visual indication of responsiveness. Future research should investigate other manipulations of embodiment.
These findings are important for understanding how a conversational agent might be best used for interviews. As businesses and researchers develop conversational technology designed to gather sensitive information, including depression counseling [11] and sexually transmitted diseases [10], developers and practitioners must consider what type of CA is best suited to the task. Design considerations such as the relevance of CA responses are important to ensure that the information gathered is as accurate as possible. We find that a more conversationally capable CA-i.e., one that gives more relevant responses-increases socially desirable response bias. This means that the more capable agent receives less accurate information from interviewees. If information accuracy is critical to the success of an application, developers must consider the tradeoff between the social presence of the CA and the social desirability of the users' responses.

Limitations and future work
This work contributes to research on CAs by furthering our understanding of the benefits and potential limitations of using CAs to gather sensitive information. While the current study demonstrates that the design of an interview experience influences the level of disclosure, there are many other avenues to explore in pursuit of understanding how conversationally relevant CAs can shape interactions and manipulate individual responses, and several limitations to the current study. Future work might explore the validation of responses, empathizing, having a CA disclose embarrassing information, or manipulating the embodiment to look either less or more threatening.
One limitation in the current study, which is common in many social desirability studies, is that we do not have ground truth. Because of this, we rely on statistical assumptions (i.e., through random assignment) rather than actual knowledge of individuals' behavior. While participants who interacted with a conversationally non-relevant CA disclosed more potentially negative sensitive information, it is impossible to say if those people were inflating the truth, accurately reporting, or continuing to under-report.
While the topics used in this study, drinking behaviors and health behaviors, were shown to have significantly different levels of sensitivity, the drinking questions were not perceived as being extremely sensitive. Future work might explore more sensitive questions. Additionally, while alcohol abuse is a salient topic for college students, binge drinking is much less likely after college [66], so research on  Table 5 Results of hypothesis testing.
Hypothesis Support

H1
Interview modalities with higher social presence lead to more socially desirable responding. Supported for sensitive questions H2 The influence of interview modality on socially desirable responding is stronger for sensitive than for nonsensitive questions. Yes H3 Increasing CA conversational relevance increases participants' socially desirable responding. Supported for sensitive questions H4 Visual embodiment of a CA increases socially desirable responding. No H5a The effect of conversational relevance on socially desirable responding is stronger for sensitive than nonsensitive questions. Yes H5b The effect of embodiment on socially desirable responding is stronger for sensitive than nonsensitive questions. No Table 6 Reported drinking behavior between interviewing conditions (mean of unstandardized values, standard deviation in parentheses). other populations must consider that different types of questions may be needed to reach the sensitivity required to induce socially desirable responding. Questions about ethnicity or income may serve as a better basis for sensitive questions [48]. A related limitation is that our study was limited to one topic for the non-sensitive and sensitive categories. Further studies should examine whether these findings generalize to other categories of sensitive and non-sensitive questions.

Conclusion
As CAs are increasingly used in applications such as gathering sensitive information, it is important to evaluate and consider the effects of CA attributes, particularly conversational relevance. Each scenario or application of CAs likely has its own goals, creating different considerations for design. For purposes such as entertainment, assistance, and general computer use, CAs that are perceived as more humanlike may provide great benefit by making interactions more natural and enjoyable. However, as shown here, CAs that are more conversationally relevant result in interviewees managing their disclosure more carefully, leading them to hide socially undesirable, but potentially important, information. Thus they may not be appropriate for applications in which eliciting truthful, but potentially embarrassing, information is critical.