Intelligent Cognitive Assistants for Attitude and Behavior Change Support in Mental Health: State-of-the-Art Technical Review

: Intelligent cognitive assistant (ICA) technology is used in various domains to emulate human behavior expressed through synchronous communication, especially written conversation. Due to their ability to use individually tailored natural language, they present a powerful vessel to support attitude and behavior change. Behavior change support systems are emerging as a crucial tool in digital mental health services, and ICAs exceed in effective support, especially for stress, anxiety and depression (SAD), where ICAs guide people’s thought processes and actions by analyzing their affective and cognitive phenomena. Currently, there is no comprehensive review of such ICAs from a technical standpoint, and existing work is conducted exclusively from a psychological or medical perspective. This technical state-of-the-art review tried to discern and systematize current technological approaches and trends as well as detail the highly interdisciplinary landscape of intersections between ICAs, attitude and behavior change, and mental health, focusing on text-based ICAs for SAD. Ten papers with systems, ﬁtting our criteria, were selected. The systems varied signiﬁcantly in their approaches, with the most successful opting for comprehensive user models, classiﬁcation-based assessment, personalized intervention, and dialogue tree conversational models.


Introduction
Change-what it is, why it happens and how to achieve it, especially in the human psyche-has taken many forms throughout recorded history: for the ancient Greek philosopher Heraclitus, change came from cosmic fire [1]; for the great Chinese teacher Confucius, change was perpetually created by the eternal struggle between opposing forces [2]; for the medieval theologian Thomas Aquinas, change originated from another world [3]; for the systems theory and family therapy pioneer Paul Watzlawick, change emerged from paradoxes [4]; for the information-ager, change is driven by technology. Strangely enough, all of them would be correct if referring to human psyche: Heraclitus' cosmic fire is love (commonly mythologized as one of the sources of change, e.g., by the pre-Socratic philosopher Empedocles [5]), presumed to be the seed that produces and sustains change [6]; Confucius' opposing forces represent cognitive dissonance, behavior opposing attitudes and beliefs, resulting "in a psychologically uncomfortable state that motivates people to reduce the dissonance [...] by changing their attitudes to be more consonant" [7] (p. 1469); Aquinas' another world is the world of the human mind; Watzlawick's paradoxes demystify Confucius' divine opposing forces through a pragmatic psychotherapeutic framework; and the most de nos jours of all-the information ager's notion of how technology influences us is currently one of the most broadly discussed topics that strongly conforms to the reality of living in the information society [8].
The marriage between the advances of behavioral sciences, especially the vast knowledge on the psychological theories of how to effect attitude and behavior change in people, and technology, which has seeped its way into an omnipresent fact of our lives, has seemingly more than ever given us the possibility and the tools to do what has always been the holy grail of human endeavors-to produce effective and rapid change. The information society [8] is equipping us with intelligent informational sources at every step-a smartphone is always in our pockets, a smart bracelet always on our wrists, an internet connection in every nook and cranny of our paths. This gives us the possibility to not only always know how we are behaving, but also to intervene with that behavior.
Such pervasive technology can produce attitude and behavior change in many domains of human life where it is sought-we generally want ourselves and others to be healthy, which means exercising, sleeping and eating well; to feel well, which means successfully navigating situations and thoughts that lead us down the path of mental issues, such as stress, anxiety and depression (SAD); and other behaviors, of which most can be found in the UN's Sustainable Development Goals [9]. Issues, addressed by such behaviors, have in the recent decades seen a catastrophic rise. One issue particularly affected by the recent COVID-19 pandemic is mental health. The lack of resources and effective systemic frameworks in the field of mental is not a recent development, but it was the pandemic that had exposed how disastrous neglecting people's well-being for decades can be [10]. Existing systems were further incapacitated by imposed social distancing, where the bond between people and mental health experts severed drastically. Decision-makers are consequently turning towards technology to help in what is not only a pandemic of the body, but also a pandemic of the mind. This work contributes a piece in the needed mosaic of a systematized effort to identify how technology can help in tackling this mental health crisis. This section continues with a statement on our motivation for this work and why we believe it is needed. Afterwards, it presents three interconnected areas of research that meet for such interdisciplinary efforts: attitude and behavior change (ABC) support systems, intelligent cognitive assistant (ICA) technology, and digital mental health. Afterwards, it presents related review papers, highlighting why their insufficiency for our purposes and for computer science researchers in general, and ends the section with an outline of the paper.

Motivation for This Work
The need for this work-a review paper on intelligent cognitive assistants for attitude and behavior change in mental health, specifically for stress, anxiety and depression, from a perspective of researchers in technology-related fields-arose from investigating the technological trends and underpinnings of dialogically driven technology, used for psychological help. The search turned out three kinds of papers: (1) review papers, written by clinical experts and psychologists for clinical experts and psychologists [11][12][13][14][15][16]; (2) experiments on symptom relief in people with mental health issues, where an ICA was used; and (3) scattered papers by researchers who designed their own ICAs for mental health.
Papers described in (1) (see the first paragraph of this section), and more thoroughly detailed in Section 1.5, although insightful and helpful in several ways, were not meant for experts in our field of research, as some of them clearly explicate: "This review aimed to inform health professionals" [11] (p. 11) and "[a]lthough [embodied conversational agent (ECA)] research is almost inherently interdisciplinary, we refrained from going too deep into the technological aspects. This was because our target audience consisted of health professionals with a generally less technical background and we wanted to focus on opening up the ECA domain for them as well as providing them with an overview of the available evidence for application in routine clinical practice" [11] (p. 13). They covered very little technical, if any, aspects of ICAs used in experiments to offer psychological help, and mostly focused on the effects they had on participants. Introducing technical aspects into such work was the first incentive for our present work.
The review papers that do exist on this topic, although not technical, covered a number of papers that presumably should offer some information on ICAs for mental health, used in their experiments. However, such papers, described in (2) (see the first paragraph of this section), offer either little or no technical description of the systems used, or the systems used were proprietary and their technical details are not disclosed [17][18][19]. This made it clear that the works this review strives to examine was not included in previous reviews.
Papers in (3) (see the first paragraph of this section) are the kind of papers this work focuses on. To be rigid in our systematization, we applied various criteria that papers had to meet to be included in this review (for more, see Section 2.5). We mostly focused on work that was of technical nature and where it was clear from the system design which technologies and methods were used to provide effective psychological help and cause attitude or behavior change.
The need for the proposed review was identified before: "[I]t has to be noted that, depending on how one would like to use ECAs in future work, many more detailed questions could be investigated surrounding ECA design aspects, such as the required capabilities for, and their impact on, specific disorders or types of ECA interventions" [11] (p. 13). However, to the best of our knowledge, the existing review papers fall under the group of papers described in (1). The specific details that we are interested in as opposed to the latter, warranting our work, can be found in our Research Questions subsection.
This paper is the first systematized overview of intelligent cognitive assistants for attitude and behavior change in mental health with a focus on stress, anxiety and depression from a technical perspective. Due to the technical focus, we opted for the state-of-the-art review (for more, see Section 2.1). This novel research is necessarily interdisciplinary, combining knowledge from computer science, psychology, behavioral sciences, cognitive science, psychotherapy, and related fields. Multiple perspectives, drawing from the authors' backgrounds, are offered on the topic, mainly from computer and cognitive science.

Attitude and Behavior Change Support Systems
ABC support systems are computer systems that attempt to "change attitudes or behaviors or both (without using coercion or deception)" [20] (p. 20) and to "aid and motivate people to adopt behaviors that are beneficial to them and their community while avoiding harmful ones" [21] (p. 66). Attitude and behavior change signifies a phenomenon that is considered to be a temporary or lasting effect on an individual regarding their attitude or behavior as compared to what their attitudes were or how they behaved in the past [21] (p. 66). ABC support systems belong to the larger family of persuasive technology (PT). PT is the result of the vast advances in behavioral sciences in regards to psychological change [22], human decision-making [23] and related phenomena [24] as well as the arrival of digital technologies, artificial intelligence (AI) and big data. Many societal efforts have been put into creating technologies that would help, motivate, guide and persuade people into bettering themselves and the world around them, though such technology can be and has been abused as well [25]). ABC support systems have also been in the forefront of research (e.g., at the world's biggest AI conference, IJCAI) for helping achieve the United Nations Sustainable Development Goals, which include ensuring "healthy lives and promote well-being for all at all ages", "inclusive and quality education for all and promote lifelong learning", taking "urgent action to combat climate change and its impacts" and others [26]. PT is already used in the health and wellness areas, where it tracks people's behavior as well as their physiological and psychological processes, responding to them by trying to affect their mental states by offering psychotherapeutic advice or to motivate them into making different decisions, e.g., in regards to healthy eating [21]. There are also applications in areas such as education or environmental sustainability, where people are nudged towards greener behavior [27].
CPP is based on the idea that general persuasive strategies are not equally effective for everyone. It identifies various strategies that affect different groups of people differently.
Interactive, adaptive technology can be utilized to personalize itself to specific strategies that work for specific groups of people.
FBM is based on the idea that a certain behavior is the result of motivation, ability and a trigger occurring at the same time. Therefore, a person changing their behavior has to be sufficiently motivated, has to possess the ability to change the behavior, and has to be triggered to change the behavior. These are then combined in personalized ways to find the most effective strategies for an individual.
PSDM is based on the need for effective design and evaluation of persuasive systems, and mostly offers a framework for what kind of content and functionality PT should consider. PSDM includes four principles upon which to design PT: (1) primary task support, which supports the user's carrying out of their primary task; (2) dialogue support, which helps users move towards their goals; (3) system credibility, which raises the user's belief in the system's quality; and (4) social support, which motivates the user by leveraging social influence.
Another powerful and effective behavioral change concept-Richard Thaler, its author, received the Nobel Prize for it-is the 'nudge theory'. Nudge is "any aspect of the choice architecture that alters people's behavior in a predictable way without forbidding any options or significantly changing their economic incentive", where "the intervention must be easy and cheap to avoid" [24] (p. 6). Nudges are being incorporated into PT and ABC support systems as well [31].
For persuasive strategies to be as effective as possible, they have to be tailored to a number of specifics. There are 4 factors in the framework of the Communication-Persuasion Paradigm [32] that determine the influence: (1) characteristics of the source (i.e., the message sender); (2) the message; (3) characteristics of the destination (or the receiver of the message); and (4) the context.
For determining effective strategies, personality models, such as Big Five personality traits (B5) [33] or Hexaco [34], as well as domain specific questionnaires, offer PT a useful way to model a person. Personality is measured on different dimensions (e.g., in B5: openness, conscientiousness, extroversion, agreeableness, neuroticism), which try to describe psychological and cognitive functionalities of individuals, e.g., their mental states and decision-making abilities. Knowledge in specific domains relies on PT's use of questionnaires. For mental health, SAD questionnaires [35] can be used to categorize people with SAD symptoms, which leads to better strategy selection. Such questionnaires give insight into what influences which individuals the most. Empirical phenomenology can also be employed for more detailed first-person accounts [36], which can be used for extracting linguistic features [37,38] or for other tweaking of ABC techniques in PT. Furthermore, combining subjective data with physiological data is also proving useful for adaptive technologies [39].
ABC can be delivered through various software systems. Intelligent cognitive assistants (or chatbot, chatterbot, interactive agent, conversational AI, smartbot, bot) seem to be the most advanced [11][12][13][14][15][16]40]. The next subsection introduces such systems and describes why they seem to be the best vessel for delivering ABC.

Intelligent Cognitive Assistant Technology
There is a lack of consensus regarding the term with which to label technologies this review describes. Conversational agents, dialogue systems, smart conversational interfaces, relational agents, chatbots, and so on-in the end, we decided to go into the direction of the SRC workshop on the ICA technology [41], and label it intelligent cognitive assistant (ICA) technology, as it seems to better describe the capabilities such systems are designed to have. They are not only intelligent in terms of being able to converse and have a language model, they have many other abilities that are human-like, relating to human cognition and intelligence. ICA technology has therefore been touted as the next revolution in human-computer coexistence. The technology dates back to the beginning of AI, where one of the first chatbots was developed and available outside of a research laboratory-Weizenbaum's simulation of a Rogerian psychotherapist called ELIZA [42]. However, technological progress has only recently laid the foundations for broad adoption in the form of ICAs such as Alexa and Siri as well as more domain-specific agents such as Woebot [17]. Alexa, Siri and Google Home, however close to certain human capabilities they may seem, still often fail outside of very basic, secretary-like tasks. When used in more expert domains, such as mental health, they quickly start repeating themselves, as they only have very generic models that end up in common phrases and trivial platitudes. Sometimes, their remarks can be even dangerous for the user, as they may be perceived as flippant and negative, or give wrong medical advice. Testing their response to stressful accounts, they either do not understand or they fail to show empathy beyond empty words [43]. Expert domains of engagement therefore need domain-specific ICAs.
ICAs, which can be deployed in many devices, e.g., as virtual agents or robots, are striving to: understand context; be adaptive and flexible; learn and develop; be autonomous; be communicative, collaborative and social; be interactive and personalized; be anticipatory and predictive; perceive; act; have internal goals and motivation; interpret; and reason. To be able to come close to such capabilities, ICAs are embedded with a cognitive architecture (CogA), a "hypothesis about the fixed structures that provide a mind, whether in natural or artificial systems, and how they work together-in conjunction with knowledge and skills embodied within the architecture-to yield intelligent behavior in a diversity of complex environments" [44] (para. 2). Most importantly, ICAs possess the ability to converse in natural language. This seems to be the most immediate way in which humans communicate [45], and the effects of a dialogue on human mental states cannot be overestimated. ICAs, coupled with ABC capabilities, are establishing as a very promising PT.
Using ICAs for ABC is still a new field of research, despite ELIZA being the first chatbot, as chatbots have mostly been explored for education, customer support or in other simple question-answer contexts [46]. What makes ICAs for ABC unique, is that users reveal personal information more freely, which makes systems more successful in their goals [47]. ABC ICAs and their users also form a more longitudinal relationship. The interactions are not a one-off, where it is difficult to understand the users and act immediately with efficient strategies. This makes such ICAs able to learn from historical interactions and improve in achieving ABC. However, there is a considerable lack of evaluation standardization of PT and ABC support systems, which makes the research field prone to the introduction of researcher bias.
ICAs, besides being a vessel to understand users through modeling their psychological and physiological aspects and use such knowledge to enact ABC, present as an ideal platform for offering help in the field of mental health because of the ability to converse. This opens up new solutions in the field of digital mental health.

Digital Mental Health
Although the COVID-19 pandemic has revealed in full the problems that mental healthcare has [10] as well as thoroughly exacerbated existing well-being of people [48], the mental health pandemic has been raging for far longer [49,50]. Various decisionmakers-especially world organizations, national governments and other leaders-are starting to recognize this, which is why mental well-being appears in Goal 3 of the 17 UN Sustainable Development Goals [26]. Most common mental health issues include stress, anxiety and depression (SAD); these have seen the biggest rise in the recent decades [51].
Before the COVID-19 pandemic, figures for SAD symptoms in some groups reached over 70% for overwhelming stress symptoms, which make people unable to cope [52,53] and about 8% for disorders, connected with stress, such as post-traumatic stress disor-der [54]; almost up to 34% for anxiety disorder [55]; and up to 27% for depressive symptoms [56] and 6% for depressive disorders [57]. With the COVID-19 pandemic, we are seeing these numbers rise [48,58,59]. The number of people with SAD symptoms and disorders increased up to more than three-fold [60]. What is even more worrying is that mental health issues are very underreported, especially in developing countries [61]. How different countries report the state of their population's mental health is also rough at best and the data are mostly about the adult population [55]. This is further skewed by the fact that up to 85% of people in low-and middle-income countries receive no mental health treatment [62], treatment coverage in high-income countries for certain disorders only reaches 33% [63], and up to 96% people with SAD do not seek treatment at all [64].
Mental health issues have substantial, multi-faceted consequences, not only affecting the patient, but also their immediate surroundings (family, caretakers) and the wider society [65]. Patients are faced with a decreased quality of life, poorer educational outcomes, lowered productivity and potential subsequent poverty, social problems, abuse vulnerabilities, and additional non-mental health problems. Patients' immediate family and caretakers face increased emotional and physical challenges, decreased household income, and increased financial costs. Society as a whole faces exacerbating public health issues, corrosion of social cohesion, and the loss of several GDP percentage points and billions of dollars expenditure per nation annually. What ends up happening is that SAD increasingly perpetuates SAD. Too often, the direct result of this is the worst possible one-loss of human life. Many countries still struggle with a high suicide rate [51]. The reasons for increasing of SAD include a critical lack of mental health professionals and regulations [66] as well as inequality in access to care [67,68].
The conditions in which mental healthcare finds itself in, especially in a post-COVID-19 world, seem to present an opportunity for development of technological and other scientific therapy-based interventions, especially as individuals with mental health issues prefer therapies to medication [69]. Digital mental health, a still insufficiently explored area of research and practice, represents a way to explore how technology can complement existing mental healthcare systems to be more effective in delivering help to people that need it.
Technologies that increase the operability and effectiveness of healthcare are many [70][71][72][73], but we concentrate on addressing the implications using ICAs as PT in mental health has. By not only focusing on what ICAs offer but also on possible problems they bring, we try to provide a fair account of the potential of digital mental health as a whole.
We identified the following areas where using PT can offer positive possibilities in mental healthcare: cost, availability, stigma, and prevention. Identified negative possibilities include group exclusion, research bias, privacy problems, lack of longitudinal research, ethics of using personal information for persuasion, potential risks of digital dependence, potential problems of automation and job loss of mental healthcare professionals, and possible cost increase in certain aspects.

Positive possibilities:
Cost related to the service of mental healthcare professionals varies, not only due to country standards, but also on country regulations and subsidies. It highly depends on the number of practicing professionals. Regardless, it presents a barrier for people of lower socio-economic backgrounds [74]. PT for mental health can be realistically made free of charge (and many times is [17]) due to the much lower costs attached to it [68]. PT also offers collecting data on often overlooked (and disadvantaged) populations, thus lowering systemic bias in analysis, as well as targeting patients with low-priority conditions. Availability refers to location, time, and cost. Location-based availability concerns people with no direct access to mental healthcare (e.g., remote areas) [75]. Time-based availability concerns people needing help when their chosen professional is unavailable (e.g., panic attack during the night). PT may also minimize problems related to transportation [76]. Cost-based availability concerns people needing more than the minimum recommended amount of hours of psychological help per week [77]. Research [77,78] shows that more frequent therapy results in better outcomes, and complementary use of PT for mental health can bridge that gap for people not being able to afford more therapy by still having access to help.
Stigma refers self-stigma-the prejudice which people with mental issues turn against themselves-and public stigma-the reaction that the general population has to people with mental issues. Both are prevailing problems [79], causing up to 96% people not deciding to seek treatment [64]. Research shows that people are more comfortable disclosing themselves to a computerized system than to a person [47]. Introducing PT for this group of people may result in offering help to people that would otherwise never receive it as well as helping people get better to the point of visiting a professional.
Prevention refers to the blind spot in mental healthcare: people only come in (if at all) seeking treatment, while a lot of issues can be prevented beforehand. PT can work indirectly by providing "support for better decision making, emotional regulation or interpersonal interactions," which are "necessary to ensure that psychological, emotional and social deficits do not spiral into clinical disorder," or directly by improving "both the screening and early delivery of interventions to reduce risk factors and build psychological resources" [76] (p. 336). Therefore, treatment should not be the only target for PT, as prevention also lowers the amount of mental health issues present and thus relieve stress on healthcare.

Negative possibilities:
Group exclusion refers to those groups of people which can not only be excluded from technology-oriented mental healthcare, but may find themselves even further distanced from or even removed from the society. The groups include the elderly, who have difficulties integrating technology into their lives [80]; the lowest socio-economic class, who may not benefit from PT due to their lack of access to technology [81]; and culturallyspecific groups, whose cultural or sociopolitical specifics prevent them from adopting technology [82]. Fortunately, PT research is fledging in certain low-income parts of the world [83].
Researcher bias refers to the lack of evaluation standards of PT for mental health in this interdisciplinary endeavor. This is due to two factors: the field's youth and the various disciplines tackling the field individually. The possible problems are many: (1) PT are not always studied in empirical experiments, but in quasi-experiments [84] or no experiments at all, but if there are empirical experiments, it is mostly with PT that is proprietary and is thus harder to change; (2) the metric on which to evaluate such systems is unclear (usually comes indirectly from their effectiveness in an experiment where the goal is SAD symptoms relief [12]); (3) no consensus on what data is needed. This results in many unfounded presuppositions of researchers, ending in problematic practice.
Cost increase refers to the possibility that using PT in mental health may delay "the provision of traditional treatments with greater evidence of efficacy or by increasing the numbers of patients receiving services" [76] (p. 336). More research is needed to be able to understand the costs alleviated and costs incurred by implementing such systems.
Other potential problems are less related to our work, but worth the mention nevertheless: (1) the problem of personal information privacy [85]; (2) the problem of the lack of longitudinal research on behavior change with PT [86]; (3) the ethics of using personal information for persuasion [85]; (4) the potential risks of digital dependence [87]; and (5) the potential problem of automation and job loss of mental healthcare professionals.

Related Work
Due to the novel viewpoint of our review, deciding on the parameters of what constitutes as related work was non-trivial. It was established that none of the found review papers covered ICAs that try to induce ABC for SAD from a technical point of viewanalyzing their software structures, algorithms used, datasets which they utilize, etc. The review papers we present in this section therefore consists of work that analyzed such systems in a way that is "aimed to inform health professionals" [11] (p. 11) as opposed to researchers in the fields of computer science.
Related work is divided into three groups: (1) papers that review the use of ICAs (under different synonyms-conversational agents, chatbots, etc.) for delivering help in mental health; (2) papers that review the use of applications in general for delivering help in mental health; and (3) papers that review the use of ICAs for delivering help in health in general.
We identified six related works, belonging to the first group of papers. Provoost et al. [11] focused on embodied conversational agents (ECAs), which beside language also simulate some properties of human face-to-face conversation, including non-verbal behavior. They tried to provide an overview of the possibilities such systems present and to investigate the evidence base for their effectiveness. They found 54 studies with ECAs for treating mood disorders, anxiety, psychoses, autism, and disorders connected to substance use, which use different techniques, including reinforcement of social behaviors through expressions and multimodal conversations, to reduce symptoms. They concluded that this avenue presented an emerging and important research endeavor, with the limited results so far showing positive outcomes. They also called for more research and the production of more such systems. Vaidyam et al. [12] explored chatbots in psychiatry for assessment as well as intervention purposes. They focused on chatbots for depression, anxiety, schizophrenia, bipolar and substance abuse disorders. From 10 studies that fit their criteria, they found that the reported outcomes in using chatbots showed benefit in psychoeducation and self-adherence, as well as it being an enjoyable tool that patients used. They concluded that early evidence was promising, however they called for more research from all the actors in this interdisciplinary field. Abd-Alrazaq et al. [13] identified chatbots as a possible remedy for the shortage of mental health workers, which prompted them to pool effectiveness and safety results of 12 studies on using chatbots for depression, distress, stress, and acrophobia. They found that there is a lack of evidence on whether their effect was clinically important, but they concluded that they are safe. They warned that there is a lack of standardized evaluation metrics, resulting in high risk of bias. Gaffney et al. [14] investigated ECAs and their usability for general psychological distress. In 13 identified studies, they discovered that the efficacy and acceptability were promising with most studies showing significant reductions in mental issue symptoms. They called for researchers to produce more work on exploring mechanisms of change such systems can employ to increase efficacy, be it technical or not. Bendig et al. [15] focused on chatbots used in clinical psychology and psychotherapy. They found that most experiments done are pilot studies where it is hard to produce high-quality evidence. They report that practicability, feasibility, and acceptance of chatbots was very promising, although such technologies were still highly experimental, especially because applying technology in such a complex domain is difficult. They ended the review calling for funding to evaluate chatbots on effectiveness, sustainability, and safety. Abd-alrazaq et al. [16] reviewed chatbots for mental health, not excluding any mental disorders or purposes of chatbots in mental health. They found 41 chatbots, some only used for screening (n = 10) or training (n = 12), while other were used for therapy (n = 17) or without a specific purpose. Most treated depression (n = 16) or autism (n = 10). As the authors before, Abd-alrazaq et al. called for more evidence, but recognized possible utility of early integration of such systems in mental healthcare.
We identified four related works, belonging to the second group of papers. Bakker et al. [88] focused on any mental health app for mental health. They discovered that they lack functionality and features. They also noted a lack of research on the efficacy of apps, worrying about a complete lack of trials of any kind. They presented their own recommendations for developers of such apps. Orji and Moffatt [21] reviewed persuasive technology from the span of 16 years and they comprehensively detailed their designs, research methods, strategies and theories they use to persuade, as well as targeted behaviors. They concluded that persuasive technology was a promising avenue for wellness, but that the field was lacking longitudinal research and current technological limitations. Chan et al. [89] surveyed the use of mobile apps in psychiatric treatments. They called on mental health practitioners to show a bigger understanding for using such apps, what their features were, what should be studied more to advance their capabilities and what the possible issues may be in integrating them into clinical workflows. They concluded that patients with various mental illnesses and severities may benefit from them, despite their social and technological backgrounds, however, better practices for evaluating apps, understanding user needs, and educating them on their use was needed to increase the apps' efficacy, on top of ensuring ethical and risk-free protocols. Torous et al. [90] researched smartphone apps and focused on their adoption by clinics or consumers, as the uptake was still low in spite of the potential of apps to improve quality and access to mental healthcare. They reported high heterogeneity in metric reported by studies, and found that despite apps being even successful in their goals, they lacked user testing, privacy protection, and mechanisms that establish trustworthiness. They also did not tackle emergencies. They called for further research in all fields connected to this technology.
Finally, we identified five related works, belonging to the third group of papers. Laranjo et al. [40] focused on conversational agents with unconstrained natural language input capabilities for any health-related purpose, targeting customers as well as health professionals. They analyzed 14 different conversational agents with mostly finite-state or frame-based dialogue systems, focusing on patient self-care. Very few presented nonquasi-experimental studies. However, most reported satisfying efficacy, but they rarely evaluated patient safety. Authors ended the paper calling for better experimental designs and standardization in such works. Montenegro et al. [91] developed a taxonomy based on 40 papers related to conversational agents applied to healthcare, and with the taxonomy identified existing challenges and research gaps. They found many systems supporting patients as well as physicians, with a minority of systems focusing on student training. Most of the agents surveyed focus on health literacy, which the authors considered a future trend in the future of changing health behaviors. They discovered that the most lacking areas were bringing such technology to the elderly and making advances in user involvement, which included better interactions, interfaces, and models of learning. Safi et al. [92] investigated chatbots in the medical field from a technical aspect pertaining to their development. They identified 45 studies on using chatbots for health purposes. The most common method was pattern matching method, used commonly for question-answer conversations in providing information users ask for. Generating original output, not a pre-existing one, was rare. Very few studies collected any user data. The authors found such systems useful for providing information to interested users. Abd-Alrazaq et al. [93] performed an overview of technical (non-clinical) metrics used for evaluating dialog agents in healthcare. By scanning 65 studies, they found 27 technical metrics, pertaining to chatbots generally, to their response generation and understanding, and to their aesthetics. Their work tries to systematize and push the direction towards standardization of how to evaluate chatbots non-clinically. Pereira and Díaz [94] surveyed chatbots for health behavior change. They identified 30 papers that used health chatbots in their study, and found out that nutritional disorders and neurological disorders were the most targeted health issues, that the chatbots tried to change human competence in tackling these issues, and that users most appreciated the personalization and consumability aspects of these chatbots. Again, the authors noted that case studies were lacking and that technological implications were almost never discussed.
The rest of the paper is organized as follows: Section 2 presents the materials and methods used for this review, focusing on research methodology, study design, research questions, search strategy, paper selection criteria, and data extraction. Section 3 presents the results, focusing on search results and paper selection, description of selected papers, main findings, and answers to the research question. In Section 4, the work is discussed and compared to existing review, the technology is evaluated and advantages and disadvantages are listed. The paper finishes with Section 5.

Research Methodology
To achieve the goal of conducting the first technical review of ICAs for ABC in mental health, we opted for state-of-the-art (SOTA) review with some elements of scoping review. Initial exploration of the literature led us to the same conclusion as well, as it revealed that more traditional systematic reviews, which put more emphasis on clear outcomes, or metaanalytics approaches, which require more comparable outcomes, are inappropriate due to the novelty and technical aspects of the field. SOTA reviews are especially appropriate for more technical analyses, especially in fast-evolving fields of study. The review method is also appropriate when the work is more exploratory, when, as in this case, systematization of such technologies does not exist yet. What is considered SOTA in our topic of review is as of yet unclear. However, to really focus on applicable trends and directions of research, our review covers research from 2016 to 2021 (approx. 5 years), which is not an uncommon timespan for fast-developing fields [95]. We believe this limited timespan enables us to only survey the latest developments, methods and technologies used for ICAs in mental health. SOTA review therefore helps us underpin key concepts in a research area and produce a summarized content, offering a better overview than other forms of review methods, and yielding consistent results to solidify new technological phenomena. Using SOTA review instead of other types should also appropriately differentiate this review from the reviews listed in Section 1.5, which largely focus on clinical outcomes instead of technical foundations.
Key objectives of this review are therefore to present a novel technological research area and its technical trends.

Study Design
Our research process found the suitable framework in Arksey and O'Malley's framework [96] for reviewing work. The framework provides a direction for the necessary steps in the process. The course of such an approach includes: (1) identifying the research questions; (2) identifying relevant works; (3) identifying selection criteria and applying it to step (2); (4) extracting and organizing the data; and (5) reporting the results in ways to address the research questions and satisfy the purpose of the review.
All the steps were followed by the authors as recommended in various works [97,98]. Stage (1) in the framework was conducted with regular discussions between the authors; stage (2) was conducted by the authors working individually, relying on their experience in the field and resolving any consequent discrepancies mutually; stage (3) was based on the goals of this work and the experience of the authors; stages (4) and (5) were conducted with close cooperation between the authors. Considerable attempts were made to provide a transparent and clear presentation of the research work which resulted in this paper.

Research Questions
The work does not have one central, scoping question. Technical trends are generally reflected in a number of subsystems that comprise one system, which led us to a collection of specific questions mostly regarding such subsystems. The systematization follows this process instead of being embodied in the questions themselves. However, answering these questions should lead to the heart of the phenomenon all interdisciplinary actors in this field are interested in: How do reviewed systems achieve change?

RQ1.
Which mental health issues do the systems target? RQ2. Which technologies, methods and collected data guide the process to achieve ABC for SAD in the systems?

RQ3.
What are the technical aspects of the conversational models in the systems? RQ4. What are the platforms used to create the systems? RQ5. What domain knowledge is used to achieve ABC for SAD? RQ6. What user modeling, especially for personalization and adaptation, do the systems conduct? RQ7. What is the overarching cognitive architecture used in the systems? RQ8. How are the systems evaluated in terms of ABC for SAD? Some RQs will not have clear answers, especially as some would need clear standardization or metrics, currently not present in the field, and each of these would warrant a paper of its own. For example, evaluation, technical and clinical, seems to be left to the researchers' own devices each time they do a study, without any guidelines from a wider community. This is also why we are rather focusing on trends than on poorly defined metrics. Our RQs try to provide some direction in terms of which technical aspects of these systems are important to computer science-adjacent researchers, which is why we define our scope through them.

Search Strategy
The search query was constructed with the authors independently collecting keywords and correlating them with synonyms and related words. The construction was based on the authors' knowledge of the area as well as referring to the review papers described in Section 1.5. We also used the PICOC methodology [99] to further refine our search string. Below is the mutually agreed upon search query: "chatbot" OR "conversational agent" OR "relational agent" OR "virtual agent" OR "intelligent agent" OR "cognitive agent" AND "anxiety" OR "depression" OR "mental health" OR "stress" Preliminary searches on a wide range of databases were conducted, including querying Scopus, PubMed, EBSCOHost, Springer, the ACM Digital Library, IEEE Xplore, Google Scholar, Web of Science, EmBase, PsycINFO, Cochrane, CINAHL, Science Direct, and Inspec. However, we found that Google Scholar (as an aggregator of various scientific works as opposed to a specific database with limiting inclusion criteria) had a wide enough coverage to allow it be used instead of the listed databases. We decided for this after discovering that this insight is consistent with empirical studies on database comparison [100][101][102]. Therefore, we decided to use only Google Scholar and complementing it with a database search software Harzing's Publish or Perish, which is recommended for easier searching [103].
The search was started on 24 March 2021.

Paper Selection Criteria
Relying on our experience and knowledge of the field as well as having a clear idea of related work, we constructed a list of special criteria to apply to the paper selection process. We tried to elaborate on every decision to avoid arbitrary or biased criteria. We included all the full papers that passed all the items on the special criteria list.
The special criteria include the following items: 1. Targeted mental health issues in the paper include stress, anxiety, depression, or general well-being. We opted for this criteria as these are the most common mental health issues among the nonclinical population [57], they are seeing the most rise [104], they are targeted most by the systems we are interested in (see Section 1.5), and, as ascertained from related works, they are the easiest to target with technology. 2. The system in the paper is autonomous and not Wizard-of-Oz. The Wizard-of-Oz technique refers to the "seemingly autonomous application whose unimplemented functions are actually simulated by a human operator, known as the Wizard of Oz" [105] (p. 7). Since we are investigating technologies that enable exactly such functions, including Wizard-of-Oz systems would defeat the purpose of our work.
3. The conversational model of the system in the paper is text-based. We opted for text-based systems due to experts calling for such systems [11], due to our belief that text-based systems are the most mature in the technological landscape and therefore more amenable to being reviewed, and due to the amount of such systems being too wide to cover with one paper as the technologies used for, e.g., speech-based systems means analyzing a completely different technology. 4. The conversational model of the system in the paper allows for a synchronous, real-time two-way communication. We opted for this criterion due to the power of a dialogue in the matters of mental health [106], which compels us to research such systems, as well as the trending usage of ICAs in various areas of service [107], whose success also stems from the convenience of synchronous communication, seen in instant messaging systems [108]. 5. The paper describing the system was published between 2016 and 2021. We opted for a SOTA review of the field, and since technology is developing fast, the last five years, recommended by other researchers as well [95], should cover the trends we want to observe. 6. The system in the paper is implemented to be used with a computer or mobile devices. We opted for this criterion-as opposed to also covering, e.g., robotic platformsdue to wanting to overview conveniently available systems, which do not demand additional resources for being accessible. 7. The system in the paper is fully functional, not a part of a bigger cognitive architecture of an ICA. The power of ICAs lies in their emergent behavior when multiple parts or modules work in concert towards producing ABC. We are interested in the system as a whole, not individual parts. 8. The system in the paper is not only a design, but was implemented and can be used. We want to explore systems that are possible to build. Only implemented systems can answer some of our questions (e.g., RQ4), especially on results that such systems produce (e.g., RQ8). We therefore believe that without this criterion, the true technical trends of the field cannot be sufficiently addressed. 9. The paper describes the system in enough technical details to be able to analyze it from a computer science perspective. To be able to conduct a SOTA review, this criterion is necessary. Without it, barely any RQ can be addressed. 10. The system in the paper is non-proprietary. Many systems (or platforms used to build the systems) used in the most (cited) studies [17][18][19], reviewed by papers in Section 1.5, are non-proprietary. The most well-known systems or platforms are: Tess [18], Wysa [19], Woebot [17], DialogFlow [109], IBM Watson [110], Microsoft Bot Framework [111], and GPT-3 [112]. Unfortunately, proprietary work cannot be surveyed as their technologies are closed source and not described in enough detail to be able to analyze them. They work as a double black box-not only cannot we discern the neural networks they use for their conversational models, we cannot even discern other methodological and technological details about them. We also want to foster open source and transparent research work, so our focus on analyzable systems should also be seen in light of this.
Another criterion that we were seriously considering, but knew we could not include, was for the system in the paper to be open code and be publicly available. However, this is still such a rarity that we quickly dropped the idea.
Apart from the specific criteria list to apply to paper selection, we constructed general criteria, partly guided by the PICOC method [99].
The steps that we followed for paper selection were: 1. Use of Harzing's Publish or Perish for easier management 2. Exclusion criteria: Papers do not address "Conversational Agents" and related acronyms (population criterion I) 3. Exclusion criteria: Papers do not address "Stress," "Anxiety," "Depression," or similar words (intervention criterion II) 4. Removal of impurities: Deleting theses, dissertations, non-scientific papers, posters, review papers, books, papers with three pages or less in length 5. Quality assessment: Focusing only on peer-reviewed published papers in journals and conferences (conferences hold special importance in computer science) 6. Abstract and text filtering: Special selection criteria, not applied before, described under the special criteria subsection Removal of duplicates was not strictly necessary due to the use of one database, but since there might be various sources for the same paper (e.g., a journal and a university website), they were removed in one of the steps (e.g., step 3) or by hand when encountered.
We used PICOC to refine our criteria to be transparent and unbiased for the final paper selection. We took inspiration from the PRISMA framework [113] for reporting and we used the PRISMA diagram to visualize the process.

Data Extraction
Data extraction was focused on identifying keywords and parts of the text that help answering the RQs. Both authors independently extracted data from the papers which they deemed relevant to the review's narrative and goals. Afterwards, they relied on mutual agreement for combining the extracted data.

Results
This section presents the outcome of the review process. We report the search results and the paper selection process, we shortly describe the papers from the final selection, present the main findings and describe how they answer our RQs.

Search Results and Paper Selection
The paper selection process used various filtering methods to improve the results that fit the objectives of this review and help us answer our RQs. The process included the following steps: using Harzing's Publish or Perish for easier management; ad hoc removal of duplicates; application of exclusion criteria; removal of impurities (deleting theses, dissertations, non-scientific papers, posters, review papers, books, papers with three pages or less in length), application of quality assessment criteria, and abstract and text filtering. All authors independently selected the papers and mutually agreed on the final selection.
The search and selection ended on 26 March 2021. The paper selection process with the numbers of papers encountered in each step was: Step 1: Querying Google scholar with search string: n = 14,300 Step 2: Using Harzing's Publish or Perish, applying exclusion criteria (population criterion I and II): n = 254 Step 3: Removal of impurities, quality assessment: n = 114 Step 4: Filtering: n = 10 (number in line with similar review papers) The PRISMA diagram in Figure 1 visualizes the process. The diagram follows the PRISMA methodology [113].

Description of Selected Papers
The selection process yielded 10 papers that aligned with our criteria. These papers represent various approaches to achieving change in people with mental health issues. Since all of them feature full cognitive architectures for their systems, some of the latter's parts are homogeneous among the papers, while others are very heterogeneous. Different means of achieving the same outcome is a much needed pluralism that new fields of research should always be adopting, especially when outcomes refer to SAD symptom relief in people with mental issues, which is an exceptionally complex process to undertake. The systems in this review show that there are multiple ways of doing that, which gives the research field the flexibility and diversity. The two are needed for more possibilities for progress. Delahunty et al. [37] proposed a diagnostic ICA, which combined conversational abilities with machine learning and clinical psychology. It used sequence-to-sequence neural networks for dialogue generation and machine learning classifiers for discovering depression symptoms. The goal was to facilitate crisis support for depressive people.
Denecke et al. [38] introduced SERMO, an ICA that combined methods from cognitive behavior therapy (CBT) and lexicon-based emotion recognition to support general wellbeing in people by regulating their emotions, thoughts, and feelings. Emotion recognition in SERMO was crucial for effective strategy selection in terms of proposed activities and dialogue help. Alongside, informational strategies helped provide people with psychoeducation. User evaluation with the User Experience Questionnaire showed that the system was considered good.
Ghandeharioun et al. [114] focused on delivering ecological momentary interventions through an ICA to raise people's general well-being by relieving SAD symptoms. The system EMMA provided emotionally appropriate interventions in an empathetic manner, detecting user's moods solely through the smartphone sensor data, which was integrated with the ICA. Their results showed that their personalized machine learning model, used to determine the moods, was likable by the participants.
Khadikar et al. [115] developed Buddy, an ICA that targeted general well-being by treating symptoms of SAD, but also working as a motivational companion to help with loss of focus. The system used recurrent neural networks (RNNs) to respond to the users' emotions with appropriate dialogues that built mental resilience and drove the conversation towards positive thoughts.
Morris et al. [43] designed an ICA that simulated human capabilities in empathy expression. They repurposed online peer support data, which the ICA through corpusbased approaches presented to the user. Information retrieval and word embedding techniques produced the best matches to the user's concerns. In a controlled experiment, the users found such responses acceptable.
Park et al. [116] delivered a prototype ICA Bonobot that used motivational interviewing methods to help students cope with stress. It used conversational sequences to guide the users through the motivational interviewing processes, providing evocative questions, encouraging feedback, and reflective and affirming responses, placed in the context of the users' problems. The major focus of Bonobot was discussing the idea of change. When used in an experiment, participants were satisfied with the ICA, but pointed out that more personalized feedback and informational support would benefit the system. Pola and Chetty [117] created an ICA that offers behavioral therapy to people with depression. The ICA tried to get information from the user on their mental state. It could detect seven types of emotions from text using long-short-term-memory neural network and a pre-trained weighted word index known as glove2. The ICA's main strategy was trying to have a dialogue about the users' negative thoughts and offer different perspectives on them.
Rishabh and Anuradha [118] built three different ICAs for general well-being, using different technologies. The first, based on the famous psychotherapeutic chatbot ELIZA [42], used retrieval approaches for its language capabilities. The second, based on another famous chatbot, ALICE [119], used AIML (Artificial Intelligence Markup Language). The third used generative approaches. All of them tried to gauge the context that users conveyed to them through text and guide the conversation towards more positive sentiment.
Yorita et al. [120] proposed a stress management framework with an ICA platform working on computers, mobile devices as well as in robots. It derived various stress measures and modeled their users, which determined the strategy selection in their peer support model. Interventions targeted various factors that aim at different stress management skills. The process was driven by reinforcement learning in combination with fuzzy control. Their results show that after using the ICA, people displayed better skills at dealing with stress.
Yorita et al. [84] built on the ICA from Yorita et al. [120], expanding the models and employed strategies for help to personalize their system even further.

Main Findings
This subsection presents some of the more general findings that led us to answer our RQs in the next subsection. The overall summary of our findings is presented in Figures 2-9, while more in-depth findings can be found throughout Section 3.4 and parts of Section 4. Figure 2 is the most general one and presents included papers per year.   Figure 3 shows the amount of papers that featured ICAs with conversational models based on neural networks, being the most popular generative method for natural language understanding and generation, and the amount based on rule-based or other machine learning types. Figure 4 shows the amount of papers that featured ICAs with non-conversational models (e.g., classifiers for stress level) based on neural networks and the amount based on rulebased or other machine learning types. Figure 5 shows papers that featured ICAs that used various methods to personalize and adapt their actions. Figure 6 shows the amount of papers that featured ICAs that built their own complete cognitive architectures, and the amount that used existing (open source) platforms to create their architecture or that used existing ICAs and upgraded them.      Figure 7 shows the amount of papers that featured ICAs tackling specific mental health issues. Figure 8 shows the amount of papers that featured a user study on relieving SAD, that featured a user study on the system, and that were only evaluated by the authors. Figure 9 shows the amount of papers that featured ICAs that only did assessment, that only did intervention, and that did both.

Answering the Research Questions
To answer the research questions, both authors independently identified parts of the reviewed papers, relevant for each RQ. The extracted information was synthesized by mutually agreeing on what information answers our RQs. To present the answers in a transparent and clear way to allow for easier comparison between the reviewed works, we answer each RQ with a comparison table.

RQ1. Which Mental Health Issues Do the Systems Target?
To answer RQ1, we scanned the reviewed works for information on which mental health issues they target. This information was mostly presented in the titles, although sometimes it was more implicit, e.g., in the data collection and intervention techniques used. Table 1 presents the results and the answer to RQ1.
To answer RQ2, we scanned the reviewed works for information on which data was collected from the users by the authors and their systems, which datasets the authors used to train or augment their systems, what methods the systems were built on to produce ABC for SAD, and overall technologies used. All the listed had to have a specific purpose in producing ABC as opposed to, e.g., general conversational abilities of the system. There is a general process in treating mental health issues, which widely consists of two steps: assessment and intervention [121,122]. This was our further framework through which we viewed the reviewed works when looking for the answer to RQ2. Therefore, Table 2 presents the results and a part of the answer to RQ2 in regards to the assessment capabilities of the reviewed systems, while Table 3 presents the results and a part of the answer to RQ2 in regards to the intervention capabilities of the reviewed systems. Table 2. Answering RQ2. Which technologies, methods and collected data guide the process to achieve ABC for SAD in the systems? First step in the process: Assessment.
Work Assessment [37] The system tries to classify depression, suicidal ideation, insomnia and hypersomnia, weight change, and excessive or inappropriate guilt from linguistic user input. It trains on various datasets (eRisk, Reddit posts from users and subreddits). It extracts linguistic features from the text and uses doc2vec to vectorize it, employing feature recognition and text embedding approach to construct classifiers. It finally applies Random Forest and logistic regression to predict the presence or absence of depression symptoms. The overall F1-Score for classifiers was 0.91. [38] The system uses a lexicon-based approach using SentiWS lexicon to conduct sentiment and emotion recognition in the linguistic user input. It further applies fuzzy matching to recognize emotions from words that are similar enough to convey the same meaning. The system achieved 81% accuracy in recognizing emotions in a dataset of forum posts. [114] The system collects geolocation data from a phone, connected to the ICA, and user ID, gender, baseline scores of the big five personality test, PANAS (Positive and Negative Affect Scale, short version), and DASS (Depression, Anxiety and Stress Scale). PANAS quantifies mood and DASS captures depression, anxiety, and stress symptoms. It applies experience sampling five times a day using a visual grid based on Russel's twodimensional model of emotion to capture ground-truth labels. The affect is inferred by the system using personalized model with Random Forest regression for valence prediction (82.4% accuracy) and AdaBoost regression for arousal prediction (65.7% accuracy). [115] The system does not explicitly assess users and uses no specific assessment methods. Assessment is implicit in the linguistic intent recognition in the conversational model. [43] The system does not explicitly assess users and uses no specific assessment methods. Assessment is implicit in its matching capabilities where the user input is matched with the closest reply from the used database. [116] The system does not explicitly assess users and uses no specific assessment methods. It uses evocative questions to collect linguistic user input. Afterwards, it uses keywords from the linguistic user input which guide the conversation-these keywords can convey mental states. The keywords were acquired from a dataset that collected data from Reddit subreddits. [117] The system uses questions that target emotional states of the users to gain relevant user input. It then uses a model to detect seven types of emotions. The model uses long-short-term-memory neural network with glove2 for emotion recognition. The model is trained on the ISEAR dataset. The accuracy of emotion recognition obtained was 84%. Furthermore, it labels users into five states according to the detected emotional levels: zero depression, slightly stressed, highly stressed, slightly depressed, highly depressed. [118] The systems do not explicitly assess users and use no specific assessment methods. Assessment is either implicit in the linguistic intent recognition in the conversational model or it uses keywords from the linguistic user input which guide the conversation. [120] The system uses fuzzy inference to evaluate the content of the linguistic user input as replies to various intentional questions and to detect users' state of stress. The users are measured on Comprehensibility (Co), Manageability (Ma) and Meaningfulness (Me). "Comprehensibility means that people can understand their situation and predict their near future. Manageability is a sense that people can manage their situation. Meaningfulness means people can understand the meaning of their life." [120] (p. 3763). This determines users' Sense of Coherence (SOC) model, which is used for various strategies to increase stress management.
[84] See [120] Table 3. Answering RQ2. Which technologies, methods and collected data guide the process to achieve ABC for SAD in the systems? Second step in the process: Intervention.
Work Intervention [37] The system does not deliver interventions and has not specific intervention methods. [38] The system delivers suggestions for activities and exercises that help regulate emotions in the form of a dialogue, reminds the user on appointments and implements CBT techniques, e.g., mindfulness and focusing in goals. The dialogues vary depending on detected emotions and are mostly of informational nature. [114] The system delivers well-being interventions which include individual or social activities from a range of psychotherapeutic categories: positive psychology, cognitive behavioral, meta-cognitive, or somatic interventions. They are delivered through a textual prompt to the user with various digital tools to engage with the activity. The dialogue the system produces is based on emotions detected by selecting a random pre-written script from an emotional category, congruent with the user's state (e.g., if a person is identified to have emotions of low valence and arousal, the system produces the following: "Feeling glum? I have a skill that might brighten your day. Let us practice."). [115] The system delivers interventions in the form of positive drivers inserted in the conversation to change the trend of the users' thoughts. It also targets self-expression development and stress management. CBT techniques, motivational interviewing and analysis, positive behavior support, behavioral reinforcement, and guided actions and methods are used to encourage the user to build emotional resilience skills. Actions are encouraged at different moments, such as meditation. [43] The system delivers interventions in the form of preexisting emotional support statements, drawn from a large corpus of online interactions from the Koko platform, a platform that connects users seeking help and those who have opted to give help. The users needing help also evaluate the responses. This corpus-based approach tries to create the semblance of personalized, empathic expression. The system uses information retrieval techniques and word embeddings to automate this process in real-time, matching existing statements to appropriate inputs by the users, selecting texts that have satisfactory scores. The interaction between the system and the user is one-off-the user describes her situation and the system matches a reply from the dataset. The answers are presented as if authored by the system. [116] The system delivers interventions in the form of motivational interviewing. It can only use predefined responses, which depend on the stage of the process the user is in. These stages are Engaging, Focusing, Evoking, and Planning, where: "In Engaging, Bonobot shares brief introductions with the user and gives instructions to use the chatbot. In Focusing, Bonobot asks the user to detail their problem, possibly having them identify an inner struggle. This leads to Evoking, where Bonobot explores future goals with the user, affirming their own ideas for change. Finally, Bonobot invites the user to ponder the overall session in Planning." [116] (p. 3). The process helps users cope with stress and encourages self-reflection. [117] The system delivers interventions in the form of emotional conversational support, suggesting different, more positive perspectives on situations the users describe, and trying to prevent negative thoughts. The conversation is guided by the level of mental health issue detected. [118] The systems deliver interventions differently. ELIZA-based ICA uses Rogerian reflection to engage with the users. Information retrieval techniques are used to choose proper responses: the n-gram technique, charagram embeddings, word similarity, sentence similarity, and part-of-speech tagging. ALICE-based ICA delivers interventions by sympathizing with the user and using CBT techniques. It implements AIML, sklearn to match responses, as well as category tagging and synonym switching for conversational dynamicity. The generative language ICA only implicitly delivers interventions by being trained on empathetic text. Table 3. Cont.

Work Intervention
[120] The system delivers interventions that help improve the users' self-efficacy, which helps manage stress, as it measures users' sense of task performance and whether they feel they can do a task or not. The system, drawing from the user model which is based on the SOC model, engages the Peer Support model, which finds suitable support types and delivers them. The system uses reinforcement learning and fuzzy control the find the best Peer Support types for specific SOC models. Peer support also stimulates various aspects of a person to lower stress levels. The types of support are helper therapy (the user takes the role of the carer instead of being cared for), informational support, esteem support, and emotional support. [84] See [120]. The authors upgraded the system with expanding the helper therapy support type by the user having to be a carer offering either informational or emotional support, depending on their SOC.

RQ3. What Are the Technical Aspects of the Conversational Models in the Systems?
To answer RQ3, we scanned the reviewed works for information on which methods were used to build conversational models in the reviewed systems. Generally, we see two approaches: rule-based, dialogue tree conversational models with either free text or button-based user input options (more control, less errors, but limited conversational experiences), and generative models with free text options (less control, more errors, more affordances for conversation). Table 4 presents the results and the answer to RQ3. Table 4. Answering RQ3. What are the technical aspects of the conversational models in the systems?

Work
Conversational Model [37] The system's conversational mode was trained with seq2seq (OpenNMT) learning approach on datasets from Reddit's subreddits, the eRisk dataset and OpenSubtitles dataset using neural networks. [38] The system's conversational model is built on the Syn.Bot framework, which uses Oscova as the bot development platform and the SIML (Synthetic Intelligence Markup Language) interpreter. The model lets the users frame answers in their own words and select predefined answers. [114] The system's conversational model works on textual prompts and scripted phrasings that are utilized at contextually appropriate times. [115] The system's conversational model uses RNNs for learning as well as understanding and generating responses. The intent in the user input is recognized by the Long-Short-Term-Memory neural network. [43] The system's conversational model consists of two modules. The front-end module pairs previous responses with user inputs. The back-end module generates output using Elasticsearch, word2vec and a word-embedding procedure. The authors used the Google News dataset for training. The ICA also solicits user feedback. [116] The system's conversational model extends on ELIZA, basing its functionalities on identifying user keywords to generate responses. It consists of two modules, Flow Manager and Response Generator. Flow Manager runs the conversation and assigns template responses to lead the user. Response Generator follows the conversational flow and sequences, identifying keywords by weighting them and assembling responses. [117] The system's conversational model is built by the authors using word embeddings, word2vec, glove, pre-written questions, and trained responses to create an environment for generative, free-text conversation.

Work
Conversational Model [118] The three systems' conversational models are built with three different approaches: (1) the Retrieval Pattern Matching ICA is built on ELIZA, using the n-gram technique to get relevant responses, Charagram embeddings to learn character-based compositional models to embed textual sequences, and using word similarity, sentence similarity and part-of-speech tagging for evaluation; (2) Retrieval Rule Based AIML ICA is built on ALICE, using sklearn alongside the AIML library and various rules to generate a response; (3) the generative ICA learns on the data from The Open American National Corpus, using the Long-Short-Term-Memory method and context learning for understanding input, and using Beam Search to choose a response. [120] The system's conversational model is rule-based, basing its responses on a stored databank. The user can communicate by inputting free text or by selecting fixed inputs. The outputs are also based on the classification of the moods of the users, detected through using machine learning (see Table 2).
To answer RQ4, we scanned the reviewed works on how the ICAs were built. We focused on whether various platforms were used to produce the ICA (e.g., Rasa [123]) or whether an existing ICA and its framework were used and possibly upgraded (e.g., ELIZA). As this was one of our exclusion criteria, we did not considers papers with ICAs built on proprietary, closed code platforms (e.g., DialogFlow). Table 5 presents the results and the answer to RQ4.

Work
Platforms and Frameworks [37] No existing platform or framework/Not reported [38] Syn.Bot, OSCOVA [114] StudyPortal platform (extricated from [124]) [115] No existing platform or framework/Not reported [43] No existing platform or framework/Not reported [116] Extended ELIZA framework [117] No existing platform or framework/Not reported [118] Extended ELIZA framework, extended ALICE framework; no platform/framework reported for the third ICA [120] No existing platform or framework/Not reported [84] LINE Platform 3.4.5. RQ5. What Domain Knowledge Is Used to Achieve ABC for SAD?
To answer RQ5, we scanned the reviewed works on what domain knowledge, particularly from mental health and ABC theories, is somehow integrated into the systems. This may be through the strategies that the systems deploy to produce ABC, e.g., CBT techniques, or through user modeling, where knowledge on SAD helps make the systems more empathetic. Table 6 presents the results and the answer to RQ5. Table 6. Answering RQ5. What domain knowledge is used to achieve ABC for SAD?

Work
Domain Knowledge [37] See Table 2 [38] The system reflects knowledge on self-reflection, tracking, monitoring (diaries), ABC theory, information provision, and CBT techniques like mindfulness and goal-attainment. [114] The system reflects knowledge on positive psychology, cognitive behavioral, meta-cognitive, or somatic interventions as well as emotion theory like Russel's circumplex model. [115] The system reflects knowledge on "self-help practices such as CBT, motivational interviewing and analysis, positive behavior support, behavioral reinforcement and guided actions and methods to encourage the user to build emotional resilience skills. It helps the user to manage their stress, anxiety, overthinking, energy, helps in focus, promotes meditation and encourages the same, and other situations." [115] (p. 122) [43] The system reflects no explicitly discernible domain knowledge. [116] The system reflects knowledge on motivational interviewing, stress management, and self-reflection. [117] The system reflects knowledge on the emotion theory (seven basic emotions), emotional support and evocative questions. [118] The system reflects knowledge on Rogerian reflection. [120] The system reflects knowledge on the SOC model, Generalized Resistance Resources, helper therapy, informational support, and emotional support.
To answer RQ6, we scanned the reviewed works on what kind of data is collected on the users for the user model, and how the user is further modeled. We were also interested in how this affects the working of the system, especially in terms of how the system is personalized and how it adapts to individual users. Table 7 presents the results and the answer to RQ6. Table 7. Answering RQ6. What user modeling, especially for personalization and adaptation, do the systems conduct?

Work
User Modeling [37] See Table 2 [38] The system builds the user model on the emotion data, which it uses to personalize dialogues. [114] The system builds the user model on the following data: "user ID, gender, baseline scores of the big five personality test, PANAS (Positive and Negative Affect Scale, short version), and DASS (Depression, Anxiety and Stress Scale). PANAS quantifies mood and DASS captures depression, anxiety, and stress symptoms." [114] (p. 16). It also contains data on "experience sampling five times a day using a visual grid based on Russel's two-dimensional model of emotion." [114] (p. 16). It uses this data to select among different emotionally charged phrasings. [115] The system does not build any explicit user models.

Work
User Modeling [43] The system does not build any explicit user models. [116] The system does not build any explicit user models.
[117] See Table 2 [118] The systems do not build any explicit user models. [120] The system builds the user model on the following data: data from the SOC model, Perceived Stress Scale, Ryff's Psychological Well-Being Scales, and Hassles Scale. Each user has continually updated SOC model. Generalized Resistance Resources connect other data to the SOC mode.
To answer RQ7, we scanned the reviewed works to see if they refer to any kind of specific, pre-defined cognitive architecture (e.g., Belief-Desire-Intention architecture) they followed when constructing the system. If they did not, we were interested to see which modules comprise the cognitive architecture. Table 8 presents the results and the answer to RQ7. Table 8. Answering RQ7. What is the overarching cognitive architecture used in the systems?

Work
Cognitive Architecture [37] Not specified; modules for various mental issue problems detection, conversational model for question formation [38] Syn-Bot architecture (including OSCOVA and SIML) [114] Not specified; geolocation-emotion prediction module, personalized textual interventions module [115] Not specified; language learning module (RNN), user understanding module (NLP), response generator (NLP) with psychological techniques [43] Not specified; pairing module, user feedback module [116] Not specified; flow manager, response generator [117] Not specified; mental state classification module, response generator, user model [118] Not specified; ELIZA-based system: pattern matching module, response generator; ALICE-based system: self learning module, response generator; Generative system: training module, context module, generalization module, response generator [120] Belief-Desire-Intention architecture [84] See [120] 3.4.8. RQ8. How Are the Systems Evaluated in Terms of ABC for SAD?
To answer RQ8, we scanned the reviewed works to see the evaluation of the systems, focusing on user tested evaluation. Ideally, we wanted to see the mental health outcomes after using the system, but we also extracted data on user evaluation in terms of evaluating the system's properties. Table 9 presents the results and the answer to RQ8. Table 9. Answering RQ8. How are the systems evaluated in terms of ABC for SAD?
Work Evaluation [37] No evaluation on users [38] Tested on users and mental health professionals on the system's Attractiveness (users: below average; professionals: good), Perspicuity (users: above average; professionals: above average), Efficiency (users: below average; professionals: above average), Dependability (users: bad; professionals: below average), Stimulation (users: bad; professionals: above average), and Novelty (users: below average; professionals: excellent). [114] No evaluation on users [115] No evaluation on users [43] Tested on users where they compared the system's replies to their peers' replies with three scores: good (system: >40%; peers: >60%), ok (system: <40%; peers: <40%), and bad (system: >20%; peers: <10%). [116] Tested on users where they described the system as having evocative questions and offering self-reflection as well as potential consolidation, but noted that the feedback was clichéd. The users also wanted more informational support from the system and more suitably contextualized feedback. [117] No evaluation on users [118] No evaluation on users [120] Tested on users which used the system for five days. The system managed to improve their scores on stress managing skills, reflected in the SOC model, and their stress levels fell (approx. 30% improvement).
[84] Tested on users which used the system for 3 days. The system managed to improve their scores on stress managing skills, reflected in the SOC model.
The answers to the research questions give a thorough and detailed insight into how the reviewed systems produce ABC for SAD, especially in their underlying technical mechanisms. This is especially relevant to see what kind of data should be collected on users, how they should be modeled to personalize and adapt ICAs, how the latter should converse with the users, etc. The tables with results, which allow for easy comparisons, tell a story into what the current SOTA on how to produce change in stress, anxiety and depression with autonomous dialogue systems is. The following section discusses our work in comparison to the reviews in Section 1.5, and our results from answering the RQs, especially in the light of their significance in the wider technological landscape.

Discussion
What we have discerned with this comprehensive review is that the technically inclined research community for the reviewed systems is not large. Consequently, the discussions that can be had at this point are necessarily limited. Nevertheless, this section tries to weave a wider narrative on ICAs for ABC in mental health, backed by the results from the previous section.

Comparison of Existing Reviews
This review fundamentally differs from existing reviews (covered in Section 1.5) to the degree where we are confident calling it the first review of its kind. While the other reviews cover similar technologies, their research questions and selection criteria were entirely different. Generally, they focused on delivering a systematized review for health practitioners, which molded their research questions in the direction of looking at outcomes of using such system in terms of how they influence the users' mental health. They were evaluating the possible benefits such system could have if used in mental healthcare. Due to such focus, their selection criteria was not interested in whether there is enough technical information on the system overviewed to analyze it. This meant that they included mostly proprietary, commercial systems, which give no insight into how they are built. Such reviews, although immensely valuable for the interdisciplinary research area, does not provide the knowledge that people included in the system development could use to further advance the current technological landscape. Our work therefore stands alone in reviewing and systematizing the trends of currently non-proprietary ICAs for ABC and SAD symptom relief. We believe this work will be helpful for researchers developing such systems to base their efforts on, get ideas, and potentially find communities. The paper may also serve health professionals to get acquainted with the technology they might be using in the future, and to better understand it, potentially increasing their trust in introducing technology into mental healthcare.

Comparison of Systems from Selected Papers
The approaches to ABC, observed in the reviewed systems, considerably vary. It benefits to compare the technical underpinnings of systems targeting the same mental health issue.
Targeting stress, both systems by Yorita et al. [84,120] produced experimental results in reducing stress symptoms comparable to SOTA results, observed in review papers in Section 1.5. We believe the system achieved this by: having strong theoretical grounds for assessment, which produced comprehensive user models of the users' stress management skills as well as other psychological aspects; having explicitly personalized interventions, which were selected from a wide range of possibilities according to the factors in the user model; having a rule-based conversational model, which guided the user down appropriate dialogue paths instead of having the freedom to go off-topic (or down erroneous paths) as in free text conversational models; basing its domain knowledge on a few carefully selected psychological frameworks, such as SOC model, helper therapy, informational support, and others; and choosing a well-supported cognitive architecture, Belief-Desire-Intention architecture, to build the system on. Other systems targeting stress lacked such comprehensive architecture in terms of its modules. Some built comprehensive user models but lacked the depth of personalized strategies rooted in theory, opting for few pre-written responses [114]; some did not explicitly assess and intervene, opting for approaches that are more dependent on unsupervised understanding of and responding to users [115,118]; some produced very rigid and static systems based on a lot of top-down elements to assessment and intervention, either through matching with already existing responses [116] or by following a very strict and limited conversational path [43]. It therefore seems that a strong user model with an intelligent combination of rigidness and freedom of assessment and intervention methods through a guided conversation produces best results.
Targeting anxiety, no systems with experimental results targeting symptom reduction were found. Two systems targeted anxiety, but very generally, either through few pre-written responses [114], or by opting for dialogue freedom through a generative conversational model [115]. Ghandeharioun et al. [114], however, built their system technically based on assessment, using Random Forest and AdaBoost with satisfying results to infer mood from a comprehensive user data model, which might be a better option than implicit assessment.
Targeting depression, Delahunty et al. [37] presented the most comprehensive system for depression assessment building various classifiers on depression symptoms used on the input text. Random Forest and logistic regression were used to infer the presence of depression, suicidal ideation, insomnia and hypersomnia, weight change, and excessive or inappropriate guilt. This appears to be a more nuanced way to assess users than opting for general mental health issue labels. However, their system was assessment only. Ghandeharioun et al. [114] and Khadikar et al. [115] systems were already covered in the previous paragraphs, and the same evaluation applies here.
Systems targeting general well-being are harder to compare, but Denecke et al. [38] seemed to follow the formula of Yorita et al. in terms of building a comprehensive system with the right combination of rigidness and dynamicity in assessment, intervention and guided conversation. The system's performance seemed to be based on their assessment methods, which used a lexicon approach to extract linguistic features and infer emotions in the text.
In summary, successful systems seem to base their performance on a comprehensive user model, explicit and theoretically-backed assessment with classification models (instead of only collecting questionnaire results), explicit and personalized intervention with many strategic possibilities, and dialogue tree conversational model. As in many areas, tasks that call for machine learning are best solved with ensemble methods, such as Random Forest.

Technology Evaluation
Overall technological evaluation of the existing systems is harder due to the usage prevalence of proprietary systems. ICAs like Woebot [17], Tess [18] and Wysa [19] seem to possess architectures with SOTA ABC capabilities for SAD, and it is a shame that we were not able to include them in our research.
There are a few clear insights into the preferable technologies that the reviewed ICAs are built on. The first noticeable element is the intricate connection between the technology and the goals of such ICAs. Here, it can be discerned that conversational models in most cases are built to be fairly limited in what is otherwise SOTA in the field of chatbots. It has to be limited-mental health counselling is a very delicate matter, and preventing the generative models go out of control should be one of the primary concerns, as making them be complicit in mental health deterioration of the user is a real danger. This was seen in the case of the currently most advanced language model today, OpenAI's GPT-3 [112]. GPT-3 was being tested by the tester simulating a patient. When the tester simply wanted to book an appointment with a doctor, GPT-3 acted as a human, understanding the tester's intents with no problems. However, beyond such surface tasks and conversations, GPT-3 started not only to fault, but to exhibit very dangerous behaviors. When the tester expressed that she feels bad and needs help, GPT-3 answered that it can help, and when the tester expressed suicidal thoughts, GPT-3 recommended that the tester killed herself [125]. That this occurred with the most advanced language model in the world, produced by the leading AI research organization, is worrying to say the least. To researchers in this field, it signals not only how careful they have to be, but also that the systems they build have to be very domain-oriented and should limit the linguistic capabilities as reasonably as possible. In the domain of mental health, it is clear that free text capabilities of ICAs are not on the level where they could be feasibly used, and that generally, NLP research is not advanced enough yet to consider it for such domains [126]. When they are used, they have to be largely improved on in very domain-specific ways, making the systems non-scalable.
While the authors of the reviewed works were aware of the dangers of unconstrained textual input, their conversational models seemed too limited in what is currently possible. One glaring omission that the current language technologies feasibly offer, at least to explore and make progress on, is that the conversational model can remember historical interactions with the same user. This enables a more long-term connection between the ICA and the user, where the therapy has so much more possibilities to explore. The bond that forms and the information than can be gathered can produce much better outcomes. One possible reason why the authors did not implement this is convenience and privacythe user does not have to create an account, which removes some initial barriers to the system use, and the system does not have to store any historical data on the user, which enhances privacy.
The latter may also be the reason why there is so little user modeling and consequential personalization. The systems collect very little data on the users, which makes them static and inflexible in terms of how they can personalize their strategies to the user and adapt to various individual specificities. Since the current systems do seem to employ ABC theories and strategies, personalizing offered help to specific groups that are affected more by specific strategies [127] should be the logical next step in progressing these systems.
Due to the conversational models many times being the most fleshed out part of the reviewed ICAs, their cognitive architectures are not thought out in high detail, sometimes embedding only the conversational model. This can cause oversimplification of possibilities for the system to function, which has its place for certain purposes (very general and quick first help), but does not explore the possibilities that modeling other cognitive capabilities can bring. Over-reliance on conversational models has another downside: most work well (where they work well) for the English language [128], but hardly for other languages. Lexicons for relevant feature extraction and language datasets for training in non-English language are few compared to how many exist in English. Opting for anything other than English hinders the possibility to produce SOTA capabilities of the explored systems. Another downside of this is that non-English speakers cannot use the majority of systems produced.
Some designs of ICA cognitive architectures [31] have suggested how to sensibly use more advanced technology which might result in better outcomes, but have so far not been implemented or evaluated yet. They emphasize personalization and adaptation through strong user modeling and learning from historical interactions. It is clear that ICAs for ABC in mental health have a lot of space to grow technologically, should there be enough research in the field. The most important lesson to note is that the outcomes such ICAs produce are emergent-they represent a thoroughly researched and thought out result of highly interdisciplinary efforts, but more specifically, their behavior stems from various modules that model different cognitive abilities interacting with each other. This points to researchers needing to cooperate or being interdisciplinary themselves, not only focusing on narrow intradisciplinary or technical knowledge.

Conclusions and Future Work
This state-of-the-art technical review presents the first technical review of intelligent cognitive assistants that produce attitude and behavior change for people with stress, anxiety and depression symptoms. It introduces the topic of change and its importance as the holy grail of different research fields and human endeavors, lays out our motivation for the work, and continues to describe the interdisciplinary connections between attitude and behavior change support systems, intelligent cognitive assistant technology, and digital mental health. It presents related works-similar reviews, but points to these not being technical and targeting health practitioners, which can be discerned from the lack of technological analysis of the systems. The work further lays out our methodology, presents the process of finding and selecting papers and, finally, presents the results, which are put into context in the discussion. The results tell a story of how various systems try to achieve change, employing various technological and scientific mechanisms. However, these systems do not reflect the possible SOTA, which can be achieved with more research.
The biggest limitation of this work, as already addressed, concerns the lack of inclusion of various proprietary systems, which would bring additional value to the technical analysis this paper offers. Another limitation might be the specific criteria we constructed for the paper selection. Although we tried to produce the criteria non-arbitrarily, providing reasons for our decisions, some important papers to present might not have been included in this work. We must also consider that papers that would fit the criteria might be only be available in some smaller, specific databases that we did not include in our search. What was also limiting was our focus on the most common mental health issues, and mental health issues that such technologies usually target, especially since they are mostly experienced among non-clinical population. Including other mental health issues would widen the scope, meaning that papers with systems targeting these mental health issues could include technologies not covered in this work. The final identified limitation concerns covered related work. We focused on reviews that focused on outcomes of using reviewed systems, but it would be worthwhile to explore reviews of such systems that focus on some other aspect, e.g., acceptability, convenience of use, adherence, and data protection and privacy solutions.
Our future work is guided by the limitations listed. Including different kinds of systems for attitude and behavior change in mental health is needed to explore how technologies, not covered here, might prove beneficial. Including other mental health issues (i.e., autism, psychosis) is needed to explore how certain technologies might only work for certain mental health issues. Surveying the suggestions for systems' designs is also something to consider as to consolidate various lessons learned.
The novel contribution that this review represents points to the still emerging research field that is gaining prominence due to the ubiquity of technology and the rise of mental health issues. With meaningful integration with the existing mental healthcare and further research, artificial systems might play an important role in bettering the current mental landscape.