Introduction

Voice assistants (VAs) which are also called intelligent personal assistants are computer programs capable of understanding and responding to users using synthetic voices. Voice assistants have been integrated into different technological devices, including smartphones and smart speakers [1]. The voice modality is the central mode of communication used by these devices, rendering the graphic user interface (GUI) inapplicable or less meaningful [2]. People use VA technology in different aspects of their lives, such as for simple tasks like getting the weather report [3] or managing emails [4]. In addition, the VA can perform complex tasks like client representative tasks [5] and controllers in autonomous vehicles [6]. In other words, VA’s can revolutionize the way people interact with computing systems [7]. Currently, there is a massive global adoption of voice assistants. A report in [8] indicates that 4.2 billion VA’s were adopted and used in 2020 alone, with a projected increase to 8.4 billion by 2024. The popularity of VA’s has led to a greater research attention to its usability and user experience aspect.

Usability is a critical factor in the adoption of voice assistants [9]. A study by Zwakman et al. [10] highlighted the importance of usability in voice assistants [9]. An additional study by Coronado et al. [11] reiterated the importance of usability in human–computer interaction tools. Numerous studies have been carried out on the usability heuristics used in a VA, each study adopting a unique approach. A study by Maguire [12] used the Nielsen and Molich versions of Voice User Interface (VUI), and the heuristic Voice User Interface (VUI), to evaluate the ease of use of the VA’s. The study affirmed both the two heuristics were appropriate. However, the study noted that one was less problematic to use than the other [12]. A further study tested VUI heuristics to measure VA efficacy [13]. However, a critical factor that prevents the VA from adopting the heuristic currently available is the absence of a graphical user interface (GUI). Despite numerous studies on heuristics, the level of satisfaction is still low [14]. Furthermore, heuristics cannot be used as a standardized approach because they are approximate strategies or empirical rules for decision-making and problem-solving that do not ensure a correct solution. According to a study by Murad [16], the absence of standardized usability guidelines when developing VA interface presents a challenge in the development of an effective VA [15]. Another report from Budi & Leipheimer [17] also suggests that the usability of the VA’s requires improvements and standardization [16]. To create a standard tool a globally recognized and well-known organization is critical in the process because it eliminates bias and promotes neutrality [17]. The International Organization for Standardization (ISO) 9241-11 framework is one of the standard usability frameworks widely used for measuring technology acceptance.

According to the ISO 9241–11 framework, usability is defined as “the degree to which a program may be utilized to achieve measurable objectives with effectiveness, efficiency, and satisfaction in a specific context of usage” [18]. ISO 9241-11 provides a framework for understanding and applying the concept of usability in an interactive system and environment [19]. The main advantage of using the ISO standard is that industries and developers do not need to build different design measurement tools. This standard is intended to create compatibility with new and existing technologies, and also create trust [20]. Currently, the system developers do not have any standardized tool created specifically for the measurement of VA usability, consequently, the measures are decentralized, causing confusion among developers. The lack of in-depth assessment of the current heuristics used in the VA design affects the trust and adaptability of their users [15]. Other emerging technologies such as virtual reality [21] and game design [22] have understood the importance of creating an acceptable standardized measurement tool when designing new interfaces. Therefore, VA technology could also benefit significantly from the same concept. As evident from the above discussion, there is little to no focus on VA standardization.

Our study presents a systematic literature review comprising works carried out on the usability of voice assistants. In addition, we use the ISO 9241-11 framework as a standardized measurement tool to analyze the findings from the studies we collected. We chose the ACM and IEEE databases for the selection of our articles because both contain a variety of studies dealing with the usability aspects of VA’s. The following are the contributions of this literature review to the Human--Computer Interaction (HCI) community:

  1. 1.

    Our work highlights the studies currently carried out on VA usability. This includes the independent and dependent variables currently used.

  2. 2.

    Our study highlights the factors that affect the voice assistants' acceptance and impact the user’s total experience.

  3. 3.

    We identify and explain some attributes unique to only voice assistants, such as machine voice.

  4. 4.

    We also highlight the evaluation techniques used in previous studies to measure usability.

  5. 5.

    Finally, our study tries to compare the existing usability studies with the ISO 9241-11 framework. The decentralized approach of the VA usability measurement makes it vague to understand if the ISO 9241-11 framework is being adhered to whilst developing the usability metrics.

We hope that our input will highlight the integration of the current existing VA usability measures with the ISO 9241-11 framework. This will also verify whether the ISO 9241-11 framework can serve as a standard measure of usability in voice assistants. In conclusion, our study tries to answer the following four research questions:

  • RQ1: Can the ISO 9241–11 framework be used to measure the usability of the VA’s?

  • RQ2: What are the independent variables used when dealing with the usability of VA’s?

  • RQ3: What current measures serve as the dependent variables when evaluating the usability of VA’s?

  • RQ4: What is the relationship between the independent and dependent variables?

The remaining work is structured as follows. The second section presents the related work. This highlights what previous literature review studies had been carried out on voice agents’ usability; furthermore, the section also highlights the emergent technology that employed the ISO 9241-11 framework as a usability measuring tool. This is followed by the methodology section, which presents the inclusion and exclusion criteria used together with the review protocol. Furthermore, the query created for the database search is presented, and the database to be used is also selected. The fourth section presents the result and analysis. In this phase, the article used for this study is listed. Also, the research questions are answered. The fifth section contains discussion on the result analysis. This includes a more detailed explanation of the relationships between independent and dependent variables. Our insights and observations are included in this section as well.

Literature Review

Previous Systematic Reviews

There have been a number of systematic literature reviews concerning VA’s over the years. Table 1 presents the information for a few of the relevant works.

Table 1 Current literature reviews

As highlighted in Table 1, multiple systematic literature reviews have been carried out on VA's usability over the years. However, each study has a specific limitation and gap for improvement. For instance, some studies focus on the usability of voice assistants used only in specified fields such as education [25] and health [36]. Other studies focus on the usability of voice assistants concerning only specific age groups, such as older adults [28]. Likewise, although an in-depth analysis of the usability of the VA’s is carried out involving every usability measure in [32], this study does not use the ISO 9241 framework as a measuring standard. On the other hand, another study in [33] although uses the ISO 9241 framework as a measuring standard, however, the usage context was chatbots focusing primarily on text-based communication instead of voice. Overall, the available literature reviews on VA’s usability listed in Table 1 supports the view that very few of the current literature review studies on VA’s use the ISO 9241-11 framework as an in-depth tool for measuring usability.

The ISO 9242-11 Usability Framework

The ISO 9241-11 is a usability framework used to understand usability in situations where interactive systems are used and employed, which includes framework environments, products, and services [39]. Nigel et al. [40] conducted a study to revise the ISO 9241-11 framework standard, which reiterates the importance of the framework within the concept of usability. A number of studies have been conducted on various technologies using the ISO 9241–11 framework as a tool to measure their usability. This shows the diversified approach when using the framework. For instance, a study by Karima et al. (2016) proposed the use of ISO 9241-11 framework to measure the usability of mobile applications running on multiple operating systems by developers, in which the study identified display resolution and memory capacity as factors that affect the usability of using mobile applications [41]. Another study used the ISO 9241-11 framework to identify usability factors when developing e-government systems [42]. This study focused on the general aspect of e-Government system development and concluded the framework could be used as a usability guideline when developing a government portal. In addition, the ISO 9241-11 framework was also used to evaluate other available methods and tools. For instance, a study by Maria et al. [44] used the framework to evaluate existing tools used in the measurement of usability of software products and artifacts on the web. The study compared existing tools with the ISO 9241-11 measures for efficiency, effectiveness and satisfaction [43]. ISO 9241–11 framework has also been employed as a method of standardization tool in the geographic field [44], game therapy in dementia [45], and logistics [46]. Despite the ISO 9241-11 usability framework being utilized in different aspects of old and emergent technologies, it has not been used with a VA in the past.

Methods

We performed a systematic literature review is this study using the guidelines established by Barbara [47]. These guidelines have been widely used in other systematic review studies as a result of their rigor and inclusiveness [48]. In addition, we have added a new quality assessment process to our guidelines. The quality assessment is a list of questions that we use to independently measure each study to ensure its relevance for our review. Our quality evaluation checklists are derived from existing studies [49, 50]. The complete guidelines used in this section comprises of four different stages:

  1. 1.

    Inclusion and exclusion criteria

  2. 2.

    Search query

  3. 3.

    Database and article selection.

  4. 4.

    Quality assessment.

Inclusion and Exclusion Criteria

The inclusion and exclusion criteria used in our study are developed for completeness and avoidance of bias. The criteria we used for our study are:

  1. a.

    Studies that focus on VA, with voice being the primary modality. In scenarios where the text or graphical user interfaces are involved, they should not be the primary focus.

  2. b.

    Studies are only in the English language to avoid mistakes during translation from another language

  3. c.

    The studies include at least one user and one voice assistant to ensure that the focus is on usability, not system performance.

  4. d.

    Study has a comprehensive conclusion.

  5. e.

    Released between 2000 and 2021, because during this period the vocal assistants started to gain notable popularity

The exclusion criteria are:

  1. a.

    Studies with poor research design, where the study's purpose is not clear are excluded.

  2. b.

    White papers, posters, and academic thesis are excluded.

Search Query

We created the search query for our study using keywords arranged to search the relevant databases. We went through previous studies to find the most relevant search keyword to find what is commonly used in usability studies. After numerous debates among the researchers and seeking two HCI expert's opinion, we chose the following set of keywords: usability, user experience, voice assistants, personal assistants, conversational agents, Google Assistant, Alexa, and Siri. We connected the keywords with logical operators (AND and OR) to yield accurate results. The final search string used was (“usability” OR “user experience “) AND (“voice assistants” OR “personal assistants” OR “conversational Agents” OR “Google Assistant” OR” Alexa” OR “Siri”). The search was limited to the abstract and title of the study.

Database and Article Selection

Figure 1 highlights the graphic presentation of the selection and filtering process. The figure is adapted from the Prisma flow diagram [51]. As earlier stated, two databases are used as the sources for our article selection: the Association for Computing Machinery (ACM) and the Institute of Electrical and Electronics Engineers (IEEE). Both databases we used in our study contain the most advanced studies on VA and are highly recognized among the HCI community. The search query returned 340 results from the ACM database and 280 results from the IEEE database. 720 items in both databases were checked for duplication and 165 documents (23%) were found to be duplicated and hence removed. Additionally, more items were filtered by title and abstract. We utilized keyword match to search the title; however, the abstract was read to identify the eligibility criteria. In addition, 399 documents (72%) were removed because they did not meet the eligibility criteria. Finally, 121 documents were removed that were not consistent with the research objectives of our study. At the end of the screening process 29 articles (19%) were finally included in this literature review.

Fig. 1
figure 1

Article selection process

Quality Assessment

The selected items presented in Table 2 are used for assessing the quality of the selected articles. The process was deployed to ensure the reported contents fit into our research. The sections collected from articles such as the methodology used, analysis done, and the context of use within each article were vital to our study. Each question is a three-point scale: “Yes” is scored as 1 point, which means the question is fully answerable. “Partial” is scored as 0.5, which means the question is vaguely answered, and “NO “is scored as 0, which means it is not answered at all. All the 29 sets of finally included articles passed the quality assessment phase.

Table 2 Quality assessment checklist

Result and Analysis

List of Articles

This section lists and discusses the articles collected in the previous stage. Table 3 presents the list of all the compiled articles. Moreover, we identified the usability focus of each study.

Table 3 List of compiled articles

Voice Assistant Usability Timeline

We grouped the collected research into three categories, each representing a range of time frames (Fig. 2). The categorization is based on voice assistant period breakthroughs. The first category is from 2000 to 2006, which was the year of social media and camera phones, also known as the year of the Y2K bug in telecommunications. During these years, conversational agents started to get noticed with the introduction of the inventions such as the Honda’s Advanced Step in Innovative Mobility (ASIMO) humanoid robot [80]. The second category ranges from 2007 to 2014. During these years technological advancements got users more exposed to voice assistants through embedding them into smartphones and computers. For instance, Apple first introduced SIRI in 2011 [81], and Microsoft introduced Cortana in 2014. The last category ranges from 2015 to 2021. This was when the massive adoption of voice assistants took place, making it an all-time high.

Fig. 2
figure 2

Year of publication of selected articles

Based on the year of publication of our selected articles, Fig. 2 clearly shows that the study on VA’s has expanded significantly over the last six years (2014–2021). This can be attributed to the invention of a smart speaker and phone with built-in voice agents [82]. Another reason for VA popularity is the COVID -19 outbreak that has given a fresh impetus towards touchless interaction technologies like voice [83].

Different Embodiment Types of VA’s

Smart speakers are the mostly used embodiment of VA’s used in our selected articles. This is due to the current popularity of commercial smart speakers such as Alexa, HomePod, etc. A 2019 study showed that 35% of US households have an intelligent smart speaker, and projected to reach 75% by 2025 [84]. Use of humanoids is also popular because usability measures such as anthropomorphism are essential for voice assistant usability [85]. Furthermore, Fig. 3 shows that only a few studies were done on car interface voice assistants. Car interfaces are vocal assistants that act as intermediaries between the driver and the car. The VA car interface allows drivers to access car information and also be able to perform the task without losing focus on driving. The fourth type of software interface refers to a voice assistant software embedded inside smartphones or computers. The studies we have collected have used either the commercialized form of the software interface, such as Alexa and Siri, while others have developed new voice interfaces that are easily accessible to users due to the adoption of smartphones and computers assistants using programming codes and skills. Nevertheless, both are in the forms of different software agents.

Fig. 3
figure 3

Embodiment of Voice assistant used in selected studies

Component of ISO 9241-11 Framework

The ISO 9241-11 framework highlights two components, the context of use and usability measure [18]. We concentrate on both components to highlight any correlations between usability metrics and the context of use in the selected articles. The context of use consists of the different independent variables along with the techniques used for analyzing them. Likewise, the usability measure represents the dependent variables, i.e., the effect that the independent variables have on the overall experience of the users. Accordingly, the analysis is presented in a bi-dimensional manner in the following sections.

Context of Use

Independent Variable

We split the context of use into an independent variable and the techniques used. The independent variables presented in our study are the physical and mental attributes used to measure a given user interaction outcome. Furthermore, our study grouped the independent variables into five main categories. The grouping is shown in Fig. 4 and is based on the similar themes identified from the collected studies. The five groups included people (user attributes), voice (voice assistant attributes), task, conversational style, and anthropomorphic cues. The voice and people categories are the oldest independent variables used to measure usability. Their relevance is also seen in the recent studies, which indicate that researchers have a high interest in correlating users with the VA’s. On the other hand, anthropomorphic clues and conversational styles are relatively new to the measurement of usability. The task-independent variable is the most used variable of late, perhaps because users always test the VA’s ability to perform certain tasks. It also indicates that VA’s are widely used for various functional and utilitarian aspects. The anthropomorphic cues are seldom used in the second phase (2007–2014). However, it is most widely used in the last range (2015–2021).

Fig. 4
figure 4

Categories of independent variable use over the years

In Table 4 we highlight more details with regards to the different groups of the independent variable collected, and also present examples of the independent variables for each category. We highlight how the independent variables have been applied by the previous studies and in which environment they have been used. We defined each independent variable category in Table 4, and explained their sub-categories as well. As evident from Table 4, different independent variables are used together in multiple studies. For example, independent voice variables and independent people variables are used simultaneously in various studies, such as personality, gender, and accent. Similarities between multiple independent variables aid to understand the relationship between the variables themselves and their relationship with the usability measures. Furthermore, the table also highlights the kind of experiments carried out. Controlled experiments are effective methods for understanding the immediate cause and effect between variables. However, a noticeable drawback of controlled experiments is the absence of external validity. The results might not be the same when applied in real-world settings. For instance, the simulation experiment on cars is a controlled environment, a driver has no control over the domain in real life. The usability experience of the driver might be different in natural settings and that might sometimes prove fatal.

Table 4 Independent variables and their categorization
Techniques Used

We identified seven techniques that researchers have used as shown in Fig. 5. The quantitative experiments are the most used and the oldest technique used on voice assistants based on our data collected. The quantitative method is sometimes used as a standalone experiment and sometimes with other techniques [54]. It is worthy of notice that cars simulation experiments involving VA’s were first used in 2000. Other experiments on human communication with self-driving cars have been carried out since 1990’s. making it one of the oldest techniques for usability measurement. More accurate technique was introduced later, such as the interaction design. The interaction design employed by studies such as [61] provides a real-time experiment scenario. This avoids the drawback such as bias when using quantitative methods. Factorial design studies are majorly used by studies that compare two or more entities in a case study [55]. They are utilized mainly by studies using two or more independent variables together.

Fig. 5
figure 5

Technique used in our studies over the span period of time

Usability Measure (Dependent Variable)

This subsection of our study focuses on the usability measurement of our research. Moreover, the findings are used to answer RQ1 and RQ3. The ISO 9241-11 framework grouped usability measures into three categories; effectiveness, efficiency, and satisfaction. According to the ISO 9241-11 framework, “effectiveness is the accuracy and completeness with which users achieve specified goals.”, Whereas “Efficiency is the resources expended concerning accuracy and completeness in which users achieve goals” and “satisfaction is the freedom from discomfort and positive attitudes towards the use of the product” [18].

In numerous studies, the usability measures used were clearly outside the scope of the ISO 9241-11 framework. In total, we identified three additional usability categories attitude, machine voice (anthropomorphism), and cognitive load. The graphical representation of the different usability measures identified in this study is presented in Figs. 6 and 7. Futhermore, the figures also highlights the percentage of studies that used the mentioned usability measures in the ISO 9241-11 framework and those that are outside the framework. Based on our compiled result, the user satisfaction and effectiveness are the earliest usability measures used when measuring VA’s usability. Some studies used performance and productivity as subthemes to measure effectiveness [62]. The measure of usability has been carried out both subjectively and objectively. For instance, studies have measured the VA effectiveness by subjective means by using quantitative methods such as questionnaire tools [72]. In contrast, other studies have used objective methods such as average completed interaction [69]. Multiple usability measures are sometimes applied in the same research; for instance some studies measured effectiveness alongside efficiency and satisfaction [66, 70]. Learnability, optimization, and ease of use have been used as subthemes to measure efficiency. Interactive design is the most effective experiment that provides real-time results employed [56, 79]. The ISO 9241-11 framework works well with effectiveness, efficiency, and satisfaction; however, the users have more expectations from the voice assistant with the recent advancement of VA capabilities. Our compiled result showed that more than half of the studies are not carried out in accordance with the standard ISO 9241-11 framework (Fig. 7). The other usability measures we identified outside the ISO 9241-11 framework are attitude, machine voice, and cognitive load.

Fig. 6
figure 6

Usability measurement used over the years on our compiled articles

Fig. 7
figure 7

Percentage of ISO 9241–11 framework usability measures and non ISO 9241–11

Attitude is a set of emotions, beliefs, and behavior towards the voice assistants. Attitude results from a person’s experience and can influence user behavior. Attitude is subjected to change and is not constant. Understanding the user attitude towards the VA has become an active research area. Numerous studies have used different methods to measure subthemes of attitude such as trust, closeness, disclosure, smartness, and honesty [60, 78]. Likeability is also a subtheme of attitude, and it has been used to measure the compatibility, trust, and strength between the user and VA’s [56, 57]. Moreover, embodiment type affects the user attitude as well, A study highlighted how gaze affects the user attitude toward VA [59], and it shows VA with gaze creates trust.

We defined machine voice (anthropomorphism) as the user attribution of human characteristics and human similarity to the voice assistant. We considered machine voice an important usability measure that only applies to voice assistants due to their primary modality being the voice. Considering that fact, the measure of machine voice has also spiked currently it becomes obvious that it has been drawing a lot of interest. One of the direct purposes of the VA is to sound as humanly as possible. When the users will perceived the machines to be more human, it built more trust, which will result in a better usability experience.

The cognitive load might be mistaken for efficiency. Nevertheless, they are different. We defined cognitive load as the amount of mental capacity a person applies to communicate successfully with the VA. When it comes to VA, actions such as giving out commands require cognitive thinking and approach. The cognitive load is measured by specific characteristics unique to the VA, such as attention time during the use of the VA [76] and the user’s mental workload during use [77].

To answer RQ1 (can the ISO 9241–11 framework be used to measure the usability of the VA’s?), none of the existing works have used the ISO 9241-11 framework solely for the purpose of usability evaluation. It has been supplemented by other factors that we have presented above that are outside the scope of this framework.

Relationship Between the Independent variables and Usability Measures

After identifying the independent and dependent variables, in Tables 5 and 6 we show how they are inter-related for having a better understanding of the usability scenario of the VA’s. While Table 5 focuses on the ISO 9241-11 specific factors, Table 6 considers the non-ISO factors specifically.

Table 5 Relationship between independent variables and ISO 9241–11 framework measurement
Table 6 Relationship between independent variables and non- ISO 9241–11 framework measurement

The independent variables are grouped into categories and represented by table rows, with every category consisting of multiple independent variables. Moreover, the usability measures have been presented in the column of the table. Every usability measure is made up of different sub-themes, which are all presented on the table as well. The tables highlight the relationship between the independent and usability measures. An “X” mark present in each cell represents a study present between that independent variable and usability measure subtheme. Nonetheless, an empty cell indicates that there is no study carried out to link that relationship between the usability measure and independent variable.

Discussion

Independent Variable and Usability Measures

Our study revealed what has been previously carried out in VA usability and revealed the gaps that are yet to be addressed. We analyzed the usability measures and their relationship to the so-called independent variables. There is an easy accessibility to VA’s due to the development of different embodiment types such as speakers, humanoids, and robots. However, there is so much less focus on embodiment types and their relationship to effectiveness and anthropomorphism, which needs more attention. Some relationship gaps and associations are apparent, while some are vague. For instance, the independent variable “accent”, has often been connected with its effectiveness on users. However, what is left unanswered is if the VA accents impart the same efficacy on users of the same or different genders. Another notable gap is gender and efficiency, with very few studies on that. This will be an essential aspect to understand and apply with the recent massive adoption of voice assistants in different contexts. Another obvious gap is the query expression relationship with any ISO 9241-11 framework measures. The query expression is how a user expresses their query to the voice assistants. The query expression has been known to increase the trust and attitude of the user towards the VA. However, its relationship to usability measures such as efficiency, satisfaction, and effectiveness is still under-researched. Knowing the right way to ask queries (questions) defines the type of response a user gets. An incorrect response will be received if the right question is expressed incorrectly. From a mental model, when a user has too much energy and thought to frame a question, it affects the VA efficiency and satisfaction. However, this has not been proven by any current study.

The VA response types increase effectiveness and trust. However, its relationship to user acceptance is still unknown. Another exciting intersection is the anthropomorphic cues and attitude, which results from anthropomorphic emotional response than a practical one. Attitude is an emotional response to a giving state, hence its strong connection with anthropomorphism. The attitude toward the VA is a highly researched area [86]. Trust, likeability, and acceptance are subthemes that focused on the attitude usability measure. This can be attributed to the importance of trust while using emergent technologies such as voice assistants. User trust in voice assistants is an essential aspect with the rise of IoT devices, and user mistrust affects the acceptance and effectiveness of the VA’s [87]. Multiple studies measured user trust while using machine voice categories as an independent variable. That could be attributed to the lack of GUI in VA. Furthermore, the voice modality must be enough to cultivate user trust. Noticeably subjective methods were widely employed when measuring the user attitudes; even though subjective measures often relate to the variables they are intended to capture; however, they are also affected by cognitive biases.

The ISO 9241-11 framework is an effective tool when measuring effectiveness, efficiency, and satisfaction. However, it is not applicable when measuring usability’s, such as attitude, machine voice, and mental load. These are all measurements that are uniquely associated with voice assistants. Therefore, the ISO 9241-11 framework could be expanded to include such usability aspects.

Technique Employed

The factorial design adapts well when used in a matched subject design experiments [56]. Based on the studies collected, machine learning is not well used as an analytic tool in usability. This could be attributed to the technical aspects of machine learning and it is still relatively a new field. However, with machine learning third-party tools more analysis will be carried out. Wizard of Oz, and interactive design started gaining popularity in 2015–2021. Moreover, the Wizard of Oz and interactive techniques are more effective when using independent variables such as anthropomorphic cues. The anthropomorphic cue independent variables is used with Wizard of Oz. techniques and interaction design more than any other techniques. This could be recognized to the importance of using objective methods to avoid biased human responses. Furthermore, “machine voice” is a fairly popular usability measure. This could be attributed to the VA developers trying to give the VA a more human and intelligent attributes. The more users perceive the machine voice as intelligent and humanlike, the more they trust and adopt it. More objective technique methods should be created and used on the independent variables when measuring machine voice. Subjective techniques such as Quantitative methods are easy to use and straightforward. However, they can produce biased results.

Interactive design experiments are the most commonly used technique employed to measure the usability. However, the interaction depends on voice modality, making it different from the traditional interaction design that uses visual cues as part of its essential components. Moreover, interaction design also triggers an emotional response, which makes it effective when measuring user attitude. The absence of visual elements in interactive design used might debatably defeat the purpose of clear communication. A new standard of interaction design uniquely for voice modality should be done.

Future Works and Limitation

One limitation in our study was using a few databases as our articles source; in future studies, we intend to add more journal databases such as Scopus, and Taylor and Francis. The majority of the experiment studies we collected was conducted in a controlled environment; future studies will focus on usability measures and independent variables, that are used in natural settings; furthermore, the results can be compared together More studies should be carried out on objective techniques, also how they could cooperate with subjective techniques. This is vital because, with the rise of user expectations of voice assistants, it will be essential to understand how techniques complement each other in each usability measurement.

Conclusion

Our study aimed to understand what is currently employed for measuring voice assistant usability, and we identified the different independent variables, dependent variables, and the techniques used. Furthermore, we also focused on using the ISO 9241-11 framework to measure the usability of voices assistants. Our study classified five independent variable classes used for measuring the dependent variables. These separate classes were categorized based on the similarities between the member groups. Also, our study used the three usability measures in the ISO 9241-11 framework in conjunction with the other three to serve as the dependable variables. We uncovered that voice assistants such as car interface speakers were not studied enough, and currently, smart speakers have the most focus. Dependent variables such as machine voice (anthropomorphism) and attitude recently have more concentration than the old usability measures, such as effectiveness. We also uncovered that usability is dependent on the context of use, such as the same independent variables could be used in different usability measures. Our study highlights the relationship between the independent and dependent variables used by other studies. In conclusion, our study used the ISO 9241-11 to analyse usability. We also highlight what has been carried out on VA’s usability and what gaps are left. Moreover, we concluded even though there is a lot of usability measurement carried out, there are still many aspects that have not been researched. Furthermore, the current ISO 9241-11 framework is not suitable for measuring the recent advancement of VA because the user needs and expectation have changed with the rise of technology. Using the ISO 9241-11 framework will create ambiguity in explaining some usability measures such as machine voice, attitude and cognitive load. However, it has the potential to be a foundation for future VA usability frameworks.