1 Background

1.1 Clinical context

Delirium is a deterioration in mental functioning that occurs within hours or days and is usually triggered by an acute medical pathology, a trauma, or drugs. It is one of the most frequent medical emergencies, with a prevalence of around 20% in patients admitted to medical services, and even higher in special services such as Orthogeriatric Units or in Intensive Care Units [1]. The prevalence of delirium in hospitalized older patients has skyrocketed 90% in some moments of COVID pandemic [2].

The positive side is that delirium is preventable in 30–40% of the cases [3], being one of the most beneficial strategies for primary prevention multicomponent non-pharmacological interventions [4].

Virtual assistants may provide cognitive stimulation, social interaction, and sensory engagement to older patients during hospital stay. This can be achieved through a variety of methods, such as personalized conversations, games, and activities that are tailored to the patient's interests and abilities. Virtual assistants can also provide reminders for medication and daily routines, as well as information about the hospital environment and procedures to reduce confusion and anxiety [5, 6].

The use of a virtual assistant in the prevention phase of delirium could be particularly useful in contexts where patients are at high risk, such as cases of cognitive impairment, sensory deprivation, or a previous history of delirium [7]. This virtual assistant could also be used in combination with other preventative interventions such as early mobilization, sleep promotion, and medication management, to provide a comprehensive approach.

Smart personal assistants, such as Amazon Alexa, allow searching for information or scheduling events among other functionalities. However, the use of virtual assistants in the health field is still to be explored, so its potential acceptance by older people is uncertain.

The ADELA project, funded by Fundación MAPFRE, aims to design and develop a conversational virtual assistant to prevent delirium in older people (+ 65 years old) during hospitalization. This population is the most prone to develop delirium during hospital admission, due to age-related changes in cognition and physiology [8].

1.2 Related work

Balsa et al. [9] proposed an intelligent personal assistant (VASelfCare) to help older people coping with type 2 diabetes. The assistant is integrated in a mobile app and the interaction is done via voice or gestures. A usability evaluation of the assistant obtained a SUS score of 73.75.

Dimeff et al. [10] presented a virtual assistant (Dr. Dave) aimed at reducing unnecessary hospitalizations and suicide events in general patients. To assess usability, authors used the Usability Satisfaction and Acceptability Questionnaire (USAQ), adapted from SUS. A promising average USAQ score of 4.4 (out of 5) was obtained.

Ireland et al. [11] proposed a conversational assistant (Harlie) to treat neurological conditions such as Parkinson’s disease. Focus groups were organized to receive feedback from final users, including older people. In general, users had a positive impression, but they found some technical issues and problematic conversational responses.

Inkster et al. [12] developed a conversational assistant (Wysa) embedded in a mobile application focused on mental health for the general population. This system was tested in a study where participants showed depressive symptoms, obtaining an engagement above 70% among the users by doing an analysis through the Patient Health Questionnaire (PHQ-9) [13].

Sun [14] investigated how visual enhanced on voice user interfaces, such as conversational assistants, might mitigate usability challenged by older adults during interaction. The articled concluded that integrating visual output as a feedback mechanism facilitates interaction between older adults and assistants.

Liu et al. [15] examined older adults’ preferences for intelligent virtual assistants regarding information modality and feedback. Results showed that the visual-auditory bimodality is superior to single visual modality and single auditory modality for older adults.

Markfeld et al. [16] conducted a study to examine the effect of different feedback modalities (visual and auditory) in a table setting robot assistant for elder care. The visual feedback included the use of LEDs and a screen. The combination of LED lights and verbal commands increased participants' understanding, contributing to the quality of the interaction.

2 Objectives

The main objective of this research work is to design and develop ADELA, a conversational virtual assistant to prevent delirium in hospitalized older persons. Specific objectives are:

  • To implement a co-creation process with domain experts from the clinical field to extract functionalities and operational requirements.

  • To perform an iterative development process along with clinical domain specialists to produce a functional prototype of ADELA.

  • To refine the ADELA prototype through a usability study with potential end users. This enhanced version of ADELA will be used in a clinical trial aimed at demonstrating its usefulness to prevent delirium in hospitalized older patients (recruitment to be started in December 2022).

3 Material and methods

This section presents the methodology used to conceptualize, design, develop, and refine the conversational assistant.

Figure 1 shows the followed methodology, consisting in 5 phases, to achieve the final version of the conversational assistant.

Fig. 1
figure 1

Summary of phases involved in the project's methodology

3.1 Co-creation process

The purpose of this phase was to extract information from domain experts to focus and guide the overall design process of ADELA.

The co-creation process was conducted through a series of meetings and workshops, which brought together a heterogeneous group consisting of three medical specialists (geriatricians), that are considered the domain experts, and two technologists. Prior to the meetings, ideas were proposed to reach common ground. The purpose of this process was to enable the development team to obtain a more holistic view of what the system should include and how it should behave. The Delirium Prevention Protocol by the Geriatrics Service of the Getafe University Hospital and the clinical guidelines of the National Institute for Health and Care Excellence [17] were used as baseline to reach the functionalities presented in Table 1.

Table 1 Functional requirements extracted in co-creation phase

The Human-Centered Design approach was adopted to ensure that ADELA would be accepted and adopted. This approach involves understanding the users' needs, behaviors, and preferences through research and analysis, and then designing solutions that meet those needs. Participatory design methods, such as design workshops and usability tests, were employed to actively involve users and stakeholders throughout the design process [18].

These methods enabled the development team to gain insights into the users' mental models, expectations, and needs, and to ensure that the design of ADELA aligns with these factors. The use of participatory design is especially important in cases where the adoption and acceptance of new technologies by potential users is low, as it helps to create a usable and successful software system by perfectly adapting the design of the software to the mental model of the potential users [19].

The workshops were used to specify all the relevant aspects of the users and the system's context of use and to translate them into a system's design. These workshops involved focus groups [20] participated by the co-creation team. The second method, usability tests, was later used to evaluate the effectiveness of the first designed prototype in meeting user needs and expectations.

3.2 Functionalities

Given the set of requirements identified in the previous phase, ADELA was designed to have the functionalities described below.

3.2.1 Basic intents

ADELA incorporates a set of basic functionalities that are launched by the user, called intents, which were developed to follow the flow shown in Fig. 2.

Fig. 2
figure 2

Basic intent workflow in ADELA conversational assistant

To activate basic intents, it is necessary for the user to wake the assistant up by using the wake word “Adela”. A similar system is used in assistants such as Amazon Alexa [21]. A pre-trained neural model known as Porcupine from PicoVoice platform [22] was used for this task.

Each basic intent has been implemented in AWS Lex, a cloud service capable of extracting a user's intent from a recording. Additional cloud services such as AWS Lambda (to process intent responses) and AWS Polly (text to human speech) were used to complete this functionality.

The assistant incorporates 25 intents in total such as asking for the time, weather information and start a memory game, among others.

3.2.2 Reminders

ADELA provides reminders throughout the day, keeping users active with different activities and helping them remember things. Eight types of reminders, described in Table 1, have been included. The list of reminders is stored in an AWS DynamoDB cloud service database.

3.2.3 Playing relaxing music

According to Table 1, ADELA offers the possibility to play relaxing music during the user's preparation for sleep, to create an appropriate atmosphere. To this end, several relaxing songs have been incorporated into the assistant.

3.2.4 Phone calls

ADELA can receive phone calls to facilitate communication between patients and their relatives, thanks to the integration of the Simcom SIM868 module [23]. This module works like a mobile phone.

A whitelist with relatives’ phone numbers is stored within AWS DynamoDB service to avoid nuisance calls.

3.2.5 Lighting control

According to Table 1, ADELA must be able to control the illumination of the room where it is located to help patients maintain circadian rhythms. The lighting control has been implemented using a table lamp incorporating a smart bulb, but it is scalable to more complex settings.

The assistant automatically turns on and off the room lighting throughout the day, but it can also receive commands from the user.

3.2.6 Cognitive games

ADELA incorporates six different memory games to promote cognitive stimulation. Games included are explained in detail in Appendix 2. These games require cloud services such as AWS Lex, Dynamo DB, and Google Speech-to-Text.

3.3 Prototype design

The design of the working prototype was conducted partially in parallel to the functionalities phase described above.

3.3.1 Conceptual architecture

Figure 3 represents the conceptual architecture of the system, including ADELA assistant. Components and relationships are described below:

  • The ADELA virtual assistant is the central technological component. The older person communicates with the assistant using the voice.

  • The environment automation consists of a Wi-Fi lamp, that is remotely controlled by the assistant.

  • External servers correspond to the cloud services used to meet the assistant’s needs, such as a database.

  • The medical professionals are responsible for creating personalized intervention plans, and adapting the assistant to the patient's needs and habits.

  • An alert log is maintained to be analyzed later in case it is necessary.

  • The relatives of the older person can communicate with him/her thanks to the built-in phone call functionality provided by the assistant.

Fig. 3
figure 3

Conceptual architecture of the system

3.3.2 Hardware components

Figure 4 shows the physical prototype of the device running ADELA along with its components. The prototype includes a LED strip, controlled by an Arduino Nano, with different colors to indicate the status of the assistant (see Fig. 5). This functionality has been inspired by the work presented in [14, 15] and [16]. It has been claimed that visual-auditory bimodality in assistants and combining LED lights and verbal commands improves the interaction between assistants and older people.

Fig. 4
figure 4

3D model of the assistant together with the components of each section

Fig. 5
figure 5

LED colors to indicate different status of assistant

3.4 Evaluation

3.4.1 Usability evaluation

The main objective was to collect feedback and suggestions from potential users to improve the prototype, and thus minimize potential usability problems arising from an improbable design. This is necessary to ensure good usability, user experience and acceptance of the ADELA assistant in a real-life scenario (i.e., first in a clinical trial and later in routine care).

The experiment consisted of an individual 30-min session with each participant following the script described in Appendix 1.

This evaluation was designed to be an observational study in which ten older persons (N = 10) participated. Participants, recruited at the Geriatrics Service of Getafe University Hospital, interacted with ADELA in a single working session and provided feedback. The tests were carried out in a room within the hospital dependencies replicating the same conditions of a potentially realistic scenario. The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Ethics Committee of Getafe University Hospital (protocol code A06/22 approved 23 June 2022). Participation criteria were:

  • Inclusion criteria:

    1. o

      Age > 74 years.

    2. o

      No previous diagnosis of cognitive impairment.

  • Exclusion criteria:

    1. o

      Inability of the participant to understand and use the ADELA system.

Usability is defined as the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency, and satisfaction in a specified context of use [24,25,26]. Used assessment tools were:

  • System Usability Scale (SUS) [27]: short 10-item Likert questionnaire that provides a measure of people’s subjective perceptions of the usability of a system. These 10 items can be evaluated from ‘1–fully disagree’ to ‘5–fully-agree’. Total score ranges from 0 to 100.

  • Chatbot Usability Questionnaire (CUQ) [28]: Likert questionnaire consisting of 16 items aimed at measuring usability of chatbots.

  • Ad-hoc satisfaction questionnaire: customized questionnaire with 5 open-ended questions to collect general user feedback (see Table 2). These questions were designed to complement aspects that are not directly covered by quantitative tools such as SUS and CUQ so additional fallbacks of the evaluated prototype can be identified and thus overcome.

Table 2 Ad-hoc questionnaire

3.4.2 Technical evaluation

Every usability evaluation session was audio-recorded to extract technical conclusions and later analyze the performance of the system and understand its limitations. Data used in this analysis comes from the usability phase (N = 10).

3.5 Refinement of the assistant

This phase consisted in introducing the changes to be drawn from the evaluation of the first prototype with potential end-users. This led to a refined version of the assistant that will be deployed in a clinical trial at Getafe University Hospital.

4 Results and discussion

Descriptive statistics are used within this manuscript to analyze obtained results. This methodology is a useful tool for summarizing and describing data such as those collected in these studies.

4.1 Usability evaluation

Answers provided to the ad-hoc qualitative questions aimed to extract potential improvements, which were used to guide the refinement process. In general, all participants found the assistant practical and enjoyable. The main shortcoming pointed out was the difficulty in waking the assistant up by pronouncing its name (“Adela”). It was usually necessary to repeat the name of the assistant twice or more to wake it up. Regarding explanations and messages provided by the assistant, they were perceived as clear and easy to understand. It was mentioned that the speed of the assistant's speech was sometimes too fast.

According to other studies where the usability of a system is measured by SUS, a system can be considered usable if its SUS score is higher than 68 [27]. In our usability study, an average score of 75 was obtained. Therefore, ADELA assistant could be considered already usable in its initial version.

Figure 6 shows the percentage of responses (positive, neutral, and negative) in five categories from grouped SUS questions. Most categories were positively evaluated. However, measuring the ease of learning to use ADELA showed that more than 50% of the participants expressed some difficulties.

Fig. 6
figure 6

SUS answers grouped by categories

Regarding CUQ assessment tool, other studies using it [28, 29] also take as a reference a score of 68, from which a system can be considered usable [27, 30, 31]. In this study, an average CUQ score of 85 was obtained, so ADELA assistant can also be treated as very usable.

Figure 7 shows the average score per question in CUQ questionnaire. Odd-numbered questions have a positive meaning if their score is 5. In this case, all odd-numbered questions achieved scores higher than 4, except for question 9, which assesses how well the assistant understands the user. Despite this, participants agree on the assistant's ability to handle and resolve errors (question 13).

Fig. 7
figure 7

Average score on odd and even questions in CUQ. Odd questions score 5 positively, while even questions score 1 positively

On the other hand, even-numbered questions have a positive meaning if their score is 1. The worst results in this group, although positive, are found in questions 8 and 10. Some speeches or interactions from the assistant seems not to be sufficiently clear. Question 16, which indicates whether the assistant is complex, has a score slightly higher than 2, can be considered as an acceptable result, as older people do not see the assistant as complex to use.

4.2 Technical evaluation

4.2.1 Cloud response time

ADELA assistant uses several cloud services. This means that the system must have a stable and reliable internet connection to provide a feeling of seamless interaction.

The test consisted of asking ADELA a basic intent and measuring the time between the recording was sent until ADELA began to speak. This process was repeated five times with different connection configurations. Results are shown in Table 3.

Table 3 Cloud services average response time depending on configuration in ADELA prototype

As mentioned earlier, ADELA will be firstly deployed for a clinical trial at Getafe University Hospital, where the average connection speed varies between 10 and 20 mbps. Average response time will be around 2.7 s, considered appropriate to achieve a reasonable interaction.

4.2.2 Reliability of AWS Lex

AWS Lex is the cloud service used in ADELA assistant to understand users’ intent from an audio recording. Therefore, the reliability of this service must be high to assist users.

Intents detection rate from AWS Lex is highly dependent on the way the user speaks, which must be strongly considered when working with older persons.

An evaluation of this service was undertaken to assess its effectiveness in identifying ten different intents with ten potential users of ADELA. Whenever no predefined intent or an erroneous intent was detected, the assistant prompted the user to repeat the question.

The percentage of misguided intents ranged from 0 to 30%. Users 1, 5, 6, 7, 9, and 10 did not exhibit any misguided intents during the observation period. Users 2 and 8 had a relatively low percentage of misguided intents (10%), while users 3 and 4 had 30% and 20% of misguided intents, respectively. The rate of misguided intents is somehow variable among users; an average intent error rate of around 8% was observed. It was also found that all users were able to complete their intents in a second attempt in case AWS Lex did not resolve it the first time. Therefore, in case of understanding errors, the assistant recovers easily.

4.3 Refinement of the assistant

The analysis of the results from the refinement phase has led to several ideas on how improve ADELA, which are described in Table 4. These improvements have allowed to build a new version of ADELA ready to be deployed in a real environment.

Table 4 Problems in ADELA assistant detected in refinement phase with its respective solutions

5 Conclusions

ADELA is a research project aimed at designing and developing an intelligent conversational assistant to prevent delirium in hospitalized older people. To achieve this goal, the project went through several phases until a refined system, ready to be used in a clinical environment, was released.

ADELA was built through a co-creation phase in which a multidisciplinary team elicited the requirements of the assistant. Thanks to this co-creation phase, a first functional prototype of the conversational assistant was designed and built. This first prototype was exposed to a technical and usability evaluation process conducted with 10 potential end users (older adults). SUS and CUQ were used with a mean score of 75.5 and 85.94, respectively.

Changes to improve the first ADELA design included, among others, simplifying voice interaction and provide the user with simple information on how to use the system. The technical evaluation allowed identifying a minimum requirement of a 10mbps connection in the environment where the assistant is deployed.

It is important to acknowledge some limitations of this study that may have affected the interpretation of the results. First, the potential bias introduced by the population participating in the study; it is possible that those who volunteered had a higher interest or motivation to participate than others, which may have influenced the results. Additionally, it is important to note that the study was conducted with healthy older individuals, and not with those who would typically use the system in a real-world context (older persons with a condition leading to being hospitalized), which could limit the generalizability of the findings. From a usability standpoint, these limitations may result in a positive bias towards the system.

Furthermore, it should be noted that some technical aspects could also present several limitations. For example, the reliability of Lex was tested with a relatively small sample size of 10 individuals, which may not be sufficient to fully evaluate its reliability in different contexts. Moreover, due to the variety of voices, accents, and speech problems, the results may not be conclusive and may require further investigation. These limitations should be considered when interpreting the results of this study.

Finally, a second version of the assistant, ready to be deployed in a real environment, was released including the improvements drawn from the evaluation phase.

Future work will comprise deploying ADELA in a real scenario to assess its clinical validity in preventing delirium throughout a clinical trial. Furthermore, new features will be added to the system to enhance its behavior. Among these functionalities is the inclusion of Natural Language Processing (NLP) with models such as GPT-3 to expand the assistant's conversation options, reduce its cloud dependency by using offline services and explore new wake words detection methods to improve user experience.