Effects of a Chatbot-Based Intervention on Stress and Health-Related Parameters in a Stressed Sample: Randomized Controlled Trial

Background Stress levels and the prevalence of mental disorders in the general population have been rising in recent years. Chatbot-based interventions represent novel and promising digital approaches to improve health-related parameters. However, there is a lack of research on chatbot-based interventions in the area of mental health. Objective The aim of this study was to investigate the effects of a 3-week chatbot-based intervention guided by the chatbot ELME, specifically with respect to the ability to reduce stress and improve various health-related parameters in a stressed sample. Methods In this multicenter two-armed randomized controlled trial, 118 individuals with medium to high stress levels were randomized to the intervention group (n=59) or the treatment-as-usual control group (n=59). The ELME chatbot guided participants of the intervention group through 3 weeks of training based on the topics stress, mindfulness, and interoception, with practical and psychoeducative elements delivered in two daily interactive intervention sessions via a smartphone (approximately 10-20 minutes each). The primary outcome (perceived stress) and secondary outcomes (mindfulness; interoception or interoceptive sensibility; subjective well-being; and emotion regulation, including the subfacets reappraisal and suppression) were assessed preintervention (T1), post intervention (T2; after 3 weeks), and at follow-up (T3; after 6 weeks). During both conditions, participants also underwent ecological momentary assessments of stress and interoceptive sensibility. Results There were no significant changes in perceived stress (β03=–.018, SE=.329; P=.96) and momentary stress. Mindfulness and the subfacet reappraisal significantly increased in the intervention group over time, whereas there was no change in the subfacet suppression. Well-being and momentary interoceptive sensibility increased in both groups over time. Conclusions To gain insight into how the intervention can be improved to achieve its full potential for stress reduction, besides a longer intervention duration, specific sample subgroups should be considered. The chatbot-based intervention seems to have the potential to improve mindfulness and emotion regulation in a stressed sample. Future chatbot-based studies and interventions in health care should be designed based on the latest findings on the efficacy of rule-based and artificial intelligence–based chatbots. Trial Registration German Clinical Trials Register DRKS00027560; https://drks.de/search/en/trial/DRKS00027560 International Registered Report Identifier (IRRID) RR2-doi.org/10.3389/fdgth.2023.1046202

Overall, there is a need for research on chatbot-based interventions considering standardized characteristics (eg, intervention duration, samples, outcome assessments) and guidelines.Furthermore, interoception has not been the focus of previous research on chatbot-based interventions, neither being included as part of the intervention contents nor implemented as ecological momentary assessment (EMA) measures.EMA represents a flexible approach to measure real-time data, including health data, in everyday life [33].Therefore, to fill these gaps, we developed a new chatbot-based intervention fostering the abilities of interoception, mindfulness, and stress management in everyday life.
The aim of this study was to investigate the effects of a 3-week chatbot-based intervention on stress, mindfulness, interoception, subjective well-being, and emotion regulation in individuals with medium to high stress levels.Based on previous findings, perceived stress was chosen as the primary outcome.Further details are described in the study protocol [34].
We hypothesized that: (1) the primary outcome (perceived stress) will be reduced in the intervention group compared to the treatment-as-usual control group over time, as assessed at preintervention (T1), post intervention (T2), and at the 3-week follow-up (T3) and via EMA; and (2) the secondary outcomes (mindfulness; interoception, including interoceptive sensibility; subjective well-being, and emotion regulation) will be improved in the intervention group compared to the control group, as assessed at T1, T2, and T3.Momentary interoceptive sensibility and stress were also assessed via EMA.Furthermore, adherence, dropout reasons, usability, and user feedback regarding the intervention were assessed to potentially further improve the intervention for future research.

Setting and Recruitment
The data collection took place between February and September 2022.German-speaking people were recruited via offline and online recruitment strategies.Participants were included in the study if they (1) were 18 years or older, (2) had sufficient knowledge of the German language, (3) owned a smartphone (Android or iOS) with internet access, (4) possessed a valid smartphone number, (5) possessed a valid mailing address, (6) experienced a middle to high level of perceived stress (according to a 10-item Perceived Stress Scale [PSS-10] score≥14, assessed at screening [T0]), (7) were not diagnosed with any mental disorder, (8) did not undertake psychotherapy, and (9) were not currently participating in another online mental health intervention.

Study Design
The intervention group received a 3-week online-based intervention guided by the chatbot ELME.The control group received treatment as usual (ie, no content and only answered the questionnaires and the EMAs).Primary and secondary outcomes were assessed in both groups at T0, T1, daily during the intervention (between T1 and T2), T2, and T3.The design of the study and the usability of the chatbot were successfully tested in a previous feasibility study [35].The trial was registered a priori at the World Health Organization (WHO) International Clinical Trials Registry Platform via the German Clinical Studies Trial Register (DRKS00027560) on January 6, 2022.The detailed design of this two-armed, parallel RCT is presented in the published study protocol [34].

Study Procedure
Figure 1 provides a schematic of the study procedure including the final numbers of participants.

Intervention
ELME is a rule-based chatbot, implemented as a web-based mobile app.ELME offers psychoeducation, exercises in real-time dialogues with the chatbot, audio files, and individual feedback.Sessions were held twice a day (for approximately 10-20 minutes each) over 3 weeks and with flexible timing.Participants could postpone exercises and receive SMS text message reminders.For more detailed intervention information and the detailed procedure, see descriptions in the study protocol [34].Examples of representative dialogues of the interaction between the chatbot and a participant are depicted in Figure 2.

Ethical Considerations
All study procedures were approved by the ethics committee of Ulm University (application number 401/20).Written informed consent was obtained from all participants prior to their participation.As an incentive, participants could take part in the intervention for free and received the chance to win a €25 (approximately US $26) gift card from an online shop or, as a student participant, to receive 5 course credits as expense allowance for completing the questionnaires.A further incentive was the possible access to two relaxing exercises and to obtain individual summaries regarding the change in the participants' XSL • FO RenderX health-related parameters from preintervention to follow-up after completing the T3 questionnaire.

Primary Outcome: Perceived Stress
The PSS-10 [36] was used as a screening questionnaire.At T1 to T3, perceived stress was assessed via the 4-item short scale (PSS-4).The ratings on both scales, ranging from 0="never" to 4="very often," were calculated as sum scores, with higher scores representing higher perceived stress.

Mindfulness
The 14-item short version of the Freiburg Mindfulness Inventory [37] was used to assess mindfulness.Answers were rated on a 4-point Likert scale ranging from 1="rarely" to 4="almost always."A sum score (range 14-56) was calculated, with higher scores indicating higher mindfulness.

Subjective Well-Being
The 5-item WHO Well-Being Index [40,41] was used to assess subjective well-being.Participants responded on a 5-point Likert scale ranging from 5="all of the time" to 0="at no time."A total sum score (range 0-100, with 100 indicating the best well-being) was calculated from raw scores (range 0-25) and multiplied by 4.

Emotion Regulation
The German version [42] of the Emotion Regulation Questionnaire [43] was used to assess emotion regulation.The 10-item questionnaire included 6 items representing emotion regulation strategy reappraisal and 4 items assessing emotion regulation strategy suppression, rated on a 7-point Likert scale ranging from 1="strongly disagree" to 7="strongly agree."Accordingly, the mean scores reflect the use of and preferences for various emotion regulation strategies.

Ecological Momentary Assessment
Momentary perceived stress and momentary interoceptive sensibility were measured via EMAs twice a day (in the morning and in the afternoon).Momentary perceived stress was assessed via two adapted items for the momentary use of the PSS-4 [36]: "Do you feel that things are going your way?" and "Do you find you can cope with all the things that you have to do?"The items "How present do you feel at the moment?" and "How aware are you of your own body at the moment?"[31,44] were used to measure momentary body awareness.To assess interoceptive sensibility, we used a self-developed question, taking the heartbeat perception task developed by Schandry [45] into account: "How intense do you perceive your heartbeat in the moment?"All rating scales were presented as visual analog scales ranging from 0="not at all" to 100="very much."

Mental Health App Usability Questionnaire
To assess the usability of the chatbot as a mental health app, a self-translated German version of the 18-item Mental Health App Usability Questionnaire [46] was used, rated on a scale ranging from 1="strongly agree" to 7="strongly disagree."The questionnaire comprises the following three subscales: ease of use (5 items), interface and satisfaction (7 items), and usefulness (6 items).Mean scores for each subscale were calculated as a total mean score, with lower scores reflecting higher usability.

Adherence, Potential Dropout Reasons, and User Feedback
Adherence to the intervention was operationalized by the percentage of completed modules of the intervention.Reasons for potential dropout were assessed via the Dropout Reasons Questionnaire for Internet Interventions [47].User feedback questions asking the participants if they liked the training (range 1-10) and judging the extent of the training (1="too short" to 12="too long") were also assessed.

Data Analysis
Data analyses were performed according to the intention-to-treat principle.Due to the nested longitudinal data structure, hierarchical linear regression models were constructed to investigate the intervention effects over time.The measurement points (level 1) were nested within the participants (level 2).The regression analyses include the 3 measurement points preintervention (T1), post intervention (T2), and follow-up (T3).We analyzed hierarchical linear models and model comparisons in R using the packages lme4 [48], lmerTest [49], and r2mlm [50].The predictor variable time had an interpretable 0 point and the dichotomous predictor group was dummy-coded.Due to assumed interindividual and intraindividual differences in all outcome variables, random-intercept, random-slope models were calculated.The restricted maximum-likelihood estimator was applied for parameter estimation, as it is generally considered to be less biased compared to the maximum-likelihood estimation [51].We here report the main results that address hypotheses (1) and ( 2).The significance level for all analyses was set to P≤.05.

Participant Characteristics
A total sample of 118 participants was randomized to the intervention group (n=59; 72% female) and to the control group (n=59; 81% female).The relevant descriptive statistics at T1 are summarized in Table 1; there were no significant differences between the groups at T1.

Perceived Stress
According to the model regarding perceived stress (Table 2), the nonsignificant fixed effect of the level-1 predictor time indicated that the stress levels did not change over time (from T1 to T3).The fixed effect of the level-2 predictor group and the cross-level interaction of the variables time and group were also not significant.The results of the two models predicting momentary perceived stress showed neither significant main effects of time and group nor their interactions (see Tables S1  and S2 in Multimedia Appendix 1).

Mindfulness
The results of the model regarding mindfulness (Table 3) showed no significant fixed effects for time and group.However, the cross-level interaction of time and group was significant.

Interoceptive Sensibility
The results of the model predicting interoceptive sensibility (assessed via the IAS) revealed neither significant main effects of time or group nor their interaction (see Table S3 in Multimedia Appendix 1).Similarly, assessments via the BPQ showed no significant effects (see Table S4 in Multimedia Appendix 1).
Momentary interoceptive sensibility increased on average over time in both groups.However, the main effect for group and the cross-level interaction of time and group were not significant (see Table S5 in Multimedia Appendix 1).

Well-Being
As shown in Table 4, subjective well-being improved over time in both groups on average.However, there were neither significant differences in well-being for the groups nor over both time and group.

Emotion Regulation: Reappraisal Subfacet
The results of the model concerning the subfacet reappraisal of emotion regulation (Table 5) revealed neither a significant effect of time nor of group.However, the cross-level interaction of time and group was significant.

Emotion Regulation: Suppression Subfacet
Results regarding the suppression subfacet of emotion regulation revealed no significant changes (see Table S6 in Multimedia Appendix 1).

Adherence, Dropout Reasons, and User Feedback
The mean adherence (percentage of completed modules) was 58% for the 59 participants in the intervention group; 23 participants skipped intervention units, with "no time" cited as the main reason (n=19).In addition, 22 participants reported technical problems.The average answer rate of the EMA questions was 48% in the intervention group and 66% in the control group.In response to the question if the participants liked the training, the mean score was 6.95 (SD 1.86).The extent of the training was rated a mean score of 7.62.

Principal Results
The aim of this study was to examine the effects of a 3-week chatbot-based intervention on perceived stress and various health-related parameters in stressed individuals.The results show no significant changes in perceived stress after the intervention.There was a significant increase in mindfulness and in emotion regulation as assessed by the subfacet reappraisal in the intervention group over time, whereas there was no change in the suppression subfacet of emotion regulation.Well-being and momentary interoceptive sensibility increased in both groups over time.

Effects on Perceived Stress
The nonsignificant reduction in perceived stress is in line with the findings of a similar intervention study [52] and a pilot study [22]; however, considering statistical power problems of these studies, the intervention duration or intensity might be one factor to consider for interpreting the missing effects of the present study.Another explanation could be that there might have been greater initial focus on stress perception, which would potentially buffer the stress-reducing effects due to the intervention.This is supported by findings from psychotherapeutic interventions [53,54], in which the hypothesized effects on psychological outcomes were only detected later because of the confrontation with emotionally charging topics.Furthermore, the results of studies by Baer et al [55] and Venkatesan et al [56] indicated that the effects on perceived stress might become (more) visible after a longer duration of the intervention.
The results regarding momentary perceived stress are in line with previous studies evaluating the effects of 3-month mindfulness-based interventions [57,58].Moreover, considering the mean adherence of 58% for the present intervention, the mean answer rates of the EMA questions need to be considered when interpreting the results.

Effects on Mindfulness
The significant increase in mindfulness is in line with previous findings from online mindfulness-based interventions (eg, [2,3]), indicating that the 3-week chatbot-based intervention comprising mindfulness-based content has the potential to increase mindfulness over time in a stressed sample.A possible mechanism might be that the contents of the intervention addressing mindfulness, stress, and interoception support mindfulness.Nevertheless, mindfulness needs to be interpreted as a secondary outcome in this study.

Effects on Interoceptive Sensibility
The missing effects of interoceptive sensibility in this study are in contrast to previous positive effects found for diverse mindfulness-based interventions (eg, [31,32,59]).However, these effects were found in the context of interventions lasting at least 8 weeks.In particular, and in line with the present findings, a 1-week mindfulness-based intervention [32] or a 3-week heartbeat perception training [60] could not improve interoceptive abilities.The findings of this study support the conclusions put forth by Fischer et al [59], Bornemann and Singer [31], and Schillings et al [60] that a longer intervention might be necessary to effectively improve interoceptive abilities.Moreover, previous studies differed in the methods used to assess diverse dimensions of interoceptive abilities (eg, [61,62]).Finally, a longer intervention design of such an innovative chatbot-based intervention might only be reasonable after initial trials with a shorter intervention design such as that of 3 weeks used in this study.
Due to the innovative EMA questions in this study and another study design not including an intervention, the results are not comparable to the previous EMA study by Höller et al [63].The significant increase in momentary interoceptive sensibility could be explained by a training effect of frequent EMAs, which took place twice a day over 3 weeks.

Effects on Emotion Regulation
In line with the results regarding reappraisal, a recent systematic review and meta-analysis on mental health apps to promote emotion regulation and positive mental health in the general population [64] found a medium effect size (g=0.49)for emotion regulation compared to control conditions.However, it must be emphasized that this effect was based on only 6 studies, reflecting the lack of RCTs on chatbot-based interventions addressing emotion regulation.

Effects on Well-Being
The increase in well-being is in line with comparable previous studies [18,21,22,65] considering differences in the study designs and samples.However, well-being also improved in the control group of this study, which might have also been induced by the daily EMAs as potential positive triggers or observational processes.

Strengths and Limitations
To the best of our knowledge, this is the first chatbot-based intervention study including contents and assessments on interoception, as well as its association with mindfulness and stress.Further strengths of the study are the highly standardized design in line with the CONSORT (Consolidated Standards of Reporting Trials) guidelines [66,67] and EMAs of interoceptive sensibility [44,57,63].Furthermore, the design and the usability of the chatbot were successfully tested in a previous feasibility study [35].Therefore, the chatbot fulfills the required standards of chatbots for mental health support [6].Finally, the results indicate the high usability of the chatbot.
Limitations of this study should be mentioned and considered for the design of future studies.First, the adherence of the intervention was relatively low at only 58%.Nevertheless, this adherence rate is on average as compared to other online mindfulness-based interventions with adherence rates ranging from 35% to 92% [68].It should also be noted that adherence rates of digital or chatbot-based interventions were often not reported or operationalized by diverse assessments [20,69] and lack of long-term user engagement in eHealth is a common problem [70,71].Second, there was a majority of female participants in this study, representing 77% of the sample.Therefore, future intervention studies should consider diverse strategies to specifically address male participants.Third, this study exclusively assessed self-report data.Due to potential differences to objective physiological data [72], future studies should assess both subjective and objective data, especially regarding stress and interoception.Lastly, a text-and rule-based chatbot as used in this study might lack human-like characteristics, such as those regarding the type of interaction between the chatbot and the user.Recent meta-analyses [73,74] showed that chatbot-based studies are more effective when diverse input and output modalities are combined.A multimodal chatbot might be superior because it will appear to be more lively and flexible in dialogues with the user [75] and ready to interact.

Conclusions and Future Research
To gain insight into how such interventions can be improved to achieve their full potential for stress reduction, besides a longer intervention duration, specific sample groups should be considered, such as employees, diverse age groups, and clinical or subclinical populations, aiming to adapt to individual needs and preferences in everyday life.A chatbot-based intervention seems to have the potential to improve mindfulness and emotion regulation in a stressed sample.Additional factors such as the participants' social motivation regarding the guidance by the chatbot and the personality of the chatbot [70,76] would be of further interest to foster the alliance or a therapeutic relationship between the user or a patient and the chatbot.Future studies should also investigate the specific elements that have the greatest effects to improve diverse health parameters, such as psychoeducation or exercises.Future research should implement large language models to provide and further develop diverse artificial intelligence (AI) chatbots in digital mental health interventions [77,78].Recent findings such as those showing that AI-based chatbots are more effective in clinical or subclinical populations [74] need to be considered.Nevertheless, besides the potential of AI-based chatbots for a professional mental health service, emerging reputational risks of AI-based chatbots such as safety and data privacy issues [79,80]; gender, ethnic, and socioeconomic biases [81]; limited empathy and emotional awareness as compared to a human counterpart [82]; and hallucinations [83] should be discussed extensively.
In summary, based on the numerous prospects of chatbots in the psychological and medical field, such as counselling, psychotherapy, diagnostic assessment, and interventions [23,84,85], future studies are needed to derive robust implications in these fields.

Figure 2 .
Figure 2. Screenshots of representative dialogues in the interaction between the chatbot and a participant (in German).

Table 1 .
Comparison of relevant participant characteristics at baseline.

Table 2 .
Random-intercept, random-slope model for perceived stress with the predictors time, group, and their interaction.Fixed-effects coefficients are β values reported with SEs; random-effects coefficients are σ² (variance) values reported with SDs.
a b Not applicable.

Table 3 .
Random-intercept, random-slope model for mindfulness with the predictors time, group, and their interaction.Fixed-effects coefficients are β values reported with SEs; random-effects coefficients are σ² (variance) values reported with SDs.
a b Not applicable.

Table 4 .
Random-intercept, random-slope model for well-being with the predictors time, group, and their interaction.

Random effects (variance components)
Fixed-effects coefficients are β values reported with SEs; random-effects coefficients are σ² (variance) values reported with SDs.
a b Not applicable.

Table 5 .
Random-intercept, random-slope model for the emotion regulation reappraisal subfacet with the predictors time, group, and their interaction.Fixed-effects coefficients are β values reported with SEs; random-effects coefficients are σ² (variance) values reported with SDs.
a b Not applicable.