Task Automation Intelligent Agents: A Review

: As technological advancements increase exponentially, mobile phones become smarter with machine learning and artiﬁcial intelligence algorithms. These advancements have allowed mobile phone users to perform most of their daily routine tasks on mobile phones; tasks performed in daily routines are called repetitive tasks and are performed manually by the users themselves. However, machine learning and artiﬁcial intelligence have enabled those tasks to be performed automatically, known as task automation. The users can perform task automation, e.g., through creating automation rules or an intelligent agent, e.g., conversational agents, virtual personal assistants, etc. Several techniques to achieve task automation have been proposed, but this review shows that task automation by programming by demonstration has had massive developmental growth because of its user-centered approach. Apple Siri, Google Assistant, MS Cortana, and Amazon Alexa are the most known task automation agents. However, these agents are not widely adopted because of their usability issues. In this study, two research questions are evaluated through the available literature to expand the research on intelligent task automation agents: (1) What is the state-of-the-art in task automation agents? (2) What are the existing methods and techniques for developing usability heuristics, speciﬁcally for intelligent agents? Research shows groundbreaking developments have been made in mobile phone task automation recently. However, it must still be conducted per usability principles to achieve maximum usability and user satisfaction. The second research question further justiﬁes developing a set of domain-speciﬁc usability heuristics for mobile task automation intelligent agents.


Introduction
Task automation automates tasks using technology that humans would otherwise perform. Task automation has the potential to improve efficiency, reduce errors, and free up time for more complex and creative work, therefore becoming significantly important [1]. Automating tasks can improve efficiency by reducing the time and effort required to complete them, allowing users to focus on more high-level and value-added tasks. Task automation also increases accuracy due to being less prone to errors and mistakes than manual tasks, which can improve quality and user satisfaction. Currently, task automation is being used in a wide range of industries, including manufacturing, healthcare, finance, and customer service. For example, in manufacturing, task automation leads to streamlining production processes, reducing errors, and increasing efficiency [2]. In healthcare, task automation helps improve patient care by automating the administering of medication and tracking patients' vital signs [3]. In finance, task automation processes transactions, generates reports, and performs other tasks efficiently [4]. In customer service, task automation is used to provide quick and accurate responses to customer inquiries through the use of chatbots [5], virtual personal assistants/intelligent agents [6], or recommender systems [7].
Over the last decade, the usage of conversational agents such as virtual personal assistants and intelligent agents has increased exponentially [8,9]. These conversational agents • RQ.1. What is the state-of-the-art in task automation intelligent agents, and do these intelligent agents use any usability guidelines in the development process? • RQ.2. What are the existing methods and techniques for developing usability heuristics, and for which domains have they been developed? Are there any domain-specific usability heuristics for evaluating intelligent agents? Developing usability heuristics for task automation intelligent agents is important because of the nature of intelligent agents, which requires interaction with humans; this interaction should be natural and intuitive, and the agent should be able to understand and respond to user input in a way that is accurate and reliable, also ensuring that the agents are user-friendly and easy to use. Developing domain-specific usability heuristics for task automation intelligent agents can help ensure that these agents meet the requirements and provide guidelines for designing agents that are understandable and easy to use. Overall, developing usability heuristics for task automation intelligent agents is crucial and necessary to provide a positive and effective human-computer interaction experience.

Methodology
This study uses the search strategy previously used by Qiu et al. [24]. This is a systematic methodology that includes conducting a search on databases with search terms and analyzing articles based on inclusion and exclusion criteria. After the articles are analyzed, a full-text review is conducted on selected articles.

Data Collection
In April 2022, a literature search explored the available intelligent agents for task automation from 1994 to 2023. Similarly, a literature search explored the available usability heuristics from 2018 to 2023. The search for usability heuristics has been conducted since 2018 because a systematic literature review was already effectuated by Quiñones and Rusu [25].

Search Terms
The search terms used for the five databases were "Intelligent Task Automation Agents" and "Usability Heuristics OR Intelligent Agent". Both searched terms were modified according to the databases in different combinations.

Databases Searched
The relevant articles were searched on five different databases widely used by researchers of the human-computer interaction community: Scopus, Web of Science, Association for Computing Machinery (ACM) digital library, ScienceDirect, and the Institute of Electrical and Electronics Engineers (IEEE) Xplore. These research databases provide Future Internet 2023, 15, 196 4 of 20 full-text journals and research papers published on intelligent agents, task automation, and usability heuristics.

Article Selection
This study follows the guidelines of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement, in an updated version by Page et al. [26]. The procedure of article selection is as follows: A search strategy using the terms mentioned in Section 2.1.1 was conducted for articles from 1994-2023 for intelligent agents and usability heuristics from 2018-2023.
Duplicates were removed, and titles and abstracts were evaluated against inclusion and exclusion criteria.
Inclusion (quality criteria) and exclusion criteria are presented as follows: • In inclusion or quality criteria, the articles were first selected based on the title. After the relevant titles were separated from the search results, the articles were filtered by reading the abstracts. Articles were included in the review if they had studies focusing on the system design of task automation intelligent agents, design of intelligent personal agents/assistants, intelligent virtual assistants, virtual personal assistants, user-centered design for task automation, or contained usability heuristics for intelligent agents, development of heuristics, proposed heuristics, user evaluations, or heuristics to evaluate intelligent user interfaces.

•
In exclusion criteria, articles were excluded or rejected if they were not written in the English language, were duplicate reports of the same studies or systems from various sources, studies that had no design of intelligent agents for task automation, or studies that did not propose or develop any usability heuristics to evaluate intelligent agents or intelligent user interfaces.

Data Analysis
The articles for RQ.1 were coded in terms of (a) reference, (b) domain, (c) research aim, (d) findings, and (e) limitations. This review only analyzes the available task automation systems and intelligent agents using the programming-by-demonstration approach because of the user-centered design.

RQ.1
To answer the research question (R.Q.1), twenty-one task automation systems and intelligent agents were selected for the review. Figure 1 visually represents the work undertaken in task automation using end-user development's programming-by-demonstration approach. The early task automation systems and intelligent agents which were develope were mostly desktop-based. The first task automation system using the programming-by demonstration approach was Pursuit by Modugno and Myers [27], which enables user  The early task automation systems and intelligent agents which were developed were mostly desktop-based. The first task automation system using the programming-bydemonstration approach was Pursuit by Modugno and Myers [27], which enables users to create abstract programs directly containing variables, loops, and conditionals within the interface. It made programming easy for the users but hardly automated their tasks and needed to be supported by a user study. Being the first of its kind, as a programmingby-demonstration system, Pursuit failed to adhere to standardized usability heuristics, and the user interface was developed based on the developers' experience. One of the main reasons is that Nielsen's heuristics were introduced the same year this system was developed, making it hard for researchers to consider usability [28]. SMARTedit, another programming-by-demonstration system, allowed users to automate repetitive text-editing tasks by learning techniques drawn from a machine-learning concept called version space algebra, through which it could learn useful text-editing procedures after only a few demonstrations. The limitations of this system were that it only worked for text editing, and no usability evaluation was performed [29]. Another task automation system that used a similar version of space algebra was CHINLE, which automatically constructed programming-by-demonstration systems for applications based on the interface specifications by Chen and Weld [30]. CHINLE was also unsupported by usability evaluation or user study, built atop SUPPLE, had fixed-length loops, and could not use logical connectives inside conditionals. DocWizard, another programming-by-demonstration system, presented a novel algorithm in the Eclipse platform for automatically capturing follow-me documentation wizards by demonstrations through observing experts performing procedures. The limitation of this system was that it lacked the features for the author to specify where user inputs were required and violated several usability principles, e.g., visibility of system status, user control and freedom, and error prevention [31]. FlashFill was inductive programming to create opportunities for task automation for non-programmers through synthesizing functional or logic programs for general-purpose tasks. However, the major challenges of this research were compositionality (i.e., requiring more usability studies) and making inductive programming more cognitive (i.e., ease of use or a simple and minimalistic design). Other challenges were domain change, validation, and noise tolerance [32].
Photo manipulation tutorials, where the author demonstrates the manipulation using an instrumented version of GIMP that records all the interface and application state changes. The system automatically generates tutorials from the recordings that illustrate the manipulation using images, text, and annotations [33]. These tutorials are not supported by usability evaluation and therefore have no feedback and error correction method; the semantic tool is based on computer vision and needs to learn macros from multiple demonstrations rather than one and to generalize it. Another GUI-based task automation system is Sikuli, which uses a similar visual approach to search and automate GUIs using screenshots. No usability evaluation was performed at any stage of development; therefore, it had the following issues and limitations: it did not work as expected due to theme variation and background changes, and it also had visibility constraints because it operated on visible screen elements and did not accept invisible GUI elements, e.g., hidden underneath other windows, tabs, or scrolling out of view [34]. Another GUI-based intelligent agent is Bespoke, a system synthesizing custom GUIs by observing user demonstrations of command-line applications. Theoretically and structurally, Bespoke seemed like a promising task automation system; however, in practice, a user study was performed instead of a usability study. A user study is equally important in the development of a system; however, it has some key changes, e.g., different goals or focus, methods, and stages in the design process. No testing of GUI synthesized by Bespoke on end-users to assess their benefits and limitations was performed; this is because the user study was conducted within a lab setting where the developers were present to guide study participants. Secondly, the authors needed a more rigorous assessment of user performance on a controlled set of tasks. Lastly, no comparison of Bespoke against alternative GUI creation methods was conducted by Vaithilingam and Guo [35].
Ruler is an interactive visualization system synthesizing labeling rules using spanlevel interactive demonstration over document examples. It relieves users from the burden of writing labeling functions and enables them to focus on higher-level semantic analysis, such as identifying relevant signals for the labeling task. However, it is domain-dependent, as it is developed for data labeling, and therefore no usability evaluation was undertaken because of the specific domain dependency [36]. A state-of-the-art, desktop-based task automation system is Help-It-Looks-Confusing (HILC), a system prototype proposing a user-in-the-loop framework that learns to generate scripts of actions performed on the visual elements of GUI. A user study with the available baseline system Sikuli was conducted, which showed that Sikuli struggled to assist users in most of the test experiments. HILC accomplished simple linear and complicated tasks that spanned across multiple applications. The user study showed promising results favoring HILC; however, this state-of-the-art has limitations as well, e.g., basic actions are occasionally misclassified when none has a high probability; it works without the awareness of the state of the computer; it requires short fixed-length sleep commands after each action to account for computer loading time because the system cannot know if the operating system's task has finished or a webpage has loaded; the current appearance models have a fixed-size aspect ratio, which decreases the accuracy when items are of different sizes; processing a video tutorial takes a longer time because the system has to analyze every frame for the mouse and keyboard button status; propagation of errors to the pipeline due to inaccurate log-file generation, which is because the videos from the internet have noise and compression artifacts due to recording software and websites' video-sharing policies [37]. In the present study, the user study was centered solely on the end-user, with little attention paid to the system itself. The author contends that considering a usability study instead of a user study would have mitigated the identified limitations.
X-Droid is a framework that provides Android app developers with the ability to produce functional prototypes of applications quickly and easily. With this framework, developers can create a new app that imports various functionalities from other applications without understanding the implementation and source code. The limitation of X-Droid is responsiveness; it does not support reading images, sounds, or videos from the user interface; it does not support customized user interfaces such as interfaces managed by custom game engines; also, it can exploit external resources because the server cannot distinguish between X-Droid and regular users [38]. The researchers in this study performed a usability study to evaluate the application programming interface (API) and showed that the system was usable and easy to understand. However, as this is domain-specific and intended primarily for Android developers rather than the general population (i.e., normal users), it does not align with nor is it deemed a significant contribution to the research questions of this study. Nonetheless, it was noteworthy to mention the usability evaluation since the system was developed using a programming-by-demonstration (PbD) approach.

Web-Based Task Automation Systems and Intelligent Agents
d.mix is a tool for creating web mashups that leverages site-to-service correspondence. The user browses annotated websites and selects samples, and d.mix's sampling mechanism generates the underlying service calls that yield those elements. The limitations of this system are that the coexistence of two different sampling strategies confused the tool on how to separate a dataset; in a user study, participants had difficulty switching between multiple languages interspersed in a single page; documentation and error handling in the wiki environment was insufficient compared to other tools; and wiki-hosted applications were not scaled well beyond prototypes for a few users, similarly because a user study was performed instead of usability study which differs in goals, methods, and design process stages [39]. CoScripter developed a collaborative scripting environment for recording, automating, and sharing web-based processes [40]. A user study was performed instead Future Internet 2023, 15,196 7 of 20 of a usability evaluation for this task automation system. It was deployed in the office set-up, and over fifty corporate employees volunteered to incorporate it into their work practices. However, with usage over time, the issues of reliability and robustness provoked the need for advanced features due to upgrades in system interaction. Vegemite, an extension of CoScripter, was introduced with a spreadsheet-like environment that used direct manipulation and programming-by-demonstration techniques to populate tables with information collected from various websites automatically. However, the intelligent agent did not consider the response time of the web servers for requested data and did not support automatic or semi-automatic data cleaning [41].
Ringer is also a web-based task automation system in which a user demonstrates as input and creates a script that interacts with the page as a user would. The limitations of Ringer are that for action construct, some document object model (DOM) events occur at a remarkably high rate because JavaScript is single-threaded. A similar thread that records and replays each event must also process webpage interactions, so recording an exceptionally large number of high-frequency events can make pages slow to respond; the element construct's similarity-based node addressing approach is inherently best-effort but has no theoretical guarantees-however, in practice, it is sufficient; in the trigger construct, Ringer was designed for interactions that satisfy the trigger assumptions but fail when these do not hold. An overall limitation of Ringer is the possibility of failure due to client-side delays such as animations, timeouts, local storage issues, etc. [42]. Rousillon is a programming system based on relation selection and a generalization algorithm for writing complex web automation scripts by demonstration. This system allows the user to demonstrate how to collect the first row of a universal table view of the hierarchical dataset to teach Rousillon how to collect all rows. The limitation of Rousillon is that it focuses on realistic datasets, particularly distributed and hierarchical data [43].

Mobile-Based Task Automation Systems and Intelligent Agents
Assistive Macros by Rodrigues [44] presented an accessibility service to enable users to perform a sequence of commands with a single selection. The user could create these macros manually or automatically by detecting repeated interactions with a mobile phone. Assistive macros showed excellent results, but usability evaluation was not performed nor even a user study, and it was supported by only one case study; it also needed to support data with contextual information. Another mobile task automation system is InstructableCrowd, a crowdsourcing approach that allows users to create trigger-action (if, then) rules based on their needs via conversation [45]. The limitations of this system are that it does not focus on the robust creation of rules via conversation; it does not have a repository of common if-then patterns; it provides no feedback on the successful creation of rules; and there is no way to validate if-then rules created by the users. In addition, a user study was performed but not supported by a usability evaluation. VASTA, a vision and language-assisted programming-by-demonstration system for smartphone task automation overcomes three key challenges: (1) how to make a particular demonstration robust to positional and visual changes; (2) how to recognize changes in the automation parameters to make demonstration as generalizable as possible; (3) how to recognize from user utterance what automation the user wishes to carry out [46]. VASTA provides a vast domain usage due to the absence of complex engineering required to fetch, parse, and interpret various markup languages; vision-guided techniques can be used to supplement traditional methods; and it has no dependency on the user interfaces underlying markup languages. Therefore, it can be applied to systems where none is available, e.g., interfaces rendered with low-level graphic libraries. Similarly, as discussed above, this study is supported by a user study rather than a usability evaluation. The limitations of this intelligent agent are (1) semantic labeling of user interface elements: it only records a feature representation of user interface elements to track and find elements later while executing; (2) extensible markup language (XML) data with computer vision: automation might fail due to the object detection network's mistake. In the literature, two task automation Future Internet 2023, 15, 196 8 of 20 intelligent agents are state-of-the-art in mobile phone task automation. One is Sugilite, and the other is DoThisHere.
Sugilite is a multimodal programming-by-demonstration system that enables users to create smartphone automation, which can be performed by giving voice commands. It uses the Android accessibility application programming interface (API) to support automating arbitrary tasks in any Android application. If the user gives a verbal command, which Sugilite does not know how to execute, it prompts the user to demonstrate by direct manipulating the regular apps on the user interface. Sugilite has some limitations, which are discussed in a dissertation by Li [14]. It has usability issues in text entry such as violating standard usability principles; another major concern is privacy and security while sharing generated scripts; it also does not support confirming crucial steps, e.g., online orders and purchases, and performing undo-able tasks which are also a violation of standard usability principles. Figure 2 shows the working of Sugilite. The user interacts with the intelligent agent through voice commands. If the intelligent agent does not know how to perform a certain task, the user demonstrates the task on the user interface, and the intelligent agent observes and learns.
Sugilite is a multimodal programming-by-demonstratio to create smartphone automation, which can be performed b uses the Android accessibility application programming inte mating arbitrary tasks in any Android application. If the us which Sugilite does not know how to execute, it prompts the u manipulating the regular apps on the user interface. Sugilite are discussed in a dissertation by Li [14]. It has usability issu lating standard usability principles; another major concern i sharing generated scripts; it also does not support confirmi orders and purchases, and performing undo-able tasks which ard usability principles. Figure 2 shows the working of Sugilite. The user interac through voice commands. If the intelligent agent does not kn task, the user demonstrates the task on the user interface, a serves and learns. Similarly, DoThisHere is also a multimodal interaction cross-app tasks and reduces the burden of performing tasks users to use voice commands to refer to information or app and touch to specify where the relevant information should b allows users to transfer information to other apps with less Similarly, DoThisHere is also a multimodal interaction technique that streamlines cross-app tasks and reduces the burden of performing tasks imposed on users. It allows users to use voice commands to refer to information or app features that are off-screen and touch to specify where the relevant information should be inserted or displayed. This allows users to transfer information to other apps with less context switching [47]. The user study showed that several of the system's features were not used in the user study. Secondly, time delays affect the user taps on the icon before the system can listen to voice commands and user interface selections. The limitation of this system is that it uses a virtual assistant framework design to handle ambiguity and provide feedback; it also requires improvement in the user interface element selection algorithm.

RQ.2
To answer the research question (RQ.2.), thirty-nine review papers were selected for full-paper review. Similarly, Figure 2 visually represents the heuristic development methodology within the last half-decade. Figure 2 describes the methods and techniques adopted by the available researchers to achieve their goals. In existing heuristics studies, researchers have used existing heuristics established by practitioners and designers to develop or propose their own domain-specific heuristics. In literature reviews, researchers have used the literature to develop heuristics. Similarly, researchers in mixed processes have used different methods and techniques to develop heuristics, i.e., heuristic evaluation, usability testing, questionnaires, interviews, experiments, or even tools. In usability problems, researchers conducted a user study, interviews, usability testing of existing products, etc., to develop heuristics. However, only one researcher developed usability heuristics using universal design theory.

Development of Heuristics in Domains
Among the heuristics developed, Table 2 shows the domains in which heuristics are developed or proposed.

Health Information Systems
Among the reviewed studies, only two research papers present work on the usability evaluation of health information systems. Tremoulet

Health Information Systems
Among the reviewed studies, only two research papers present work on the evaluation of health information systems. Tremoulet

Analysis Health Information Systems
Among the reviewed studies, only two research papers present work on the usability evaluation of health information systems. Tremoulet et al. [71] used a mixed method to develop usability heuristics for health information systems. A literature review was conducted, clinical interviews were taken to gather qualitative data, and a survey was conducted to gather quantitative data. The data gathered were reviewed, issues were identified within the health information system, and usability heuristics were generated based on the identified issues. The developed heuristics were then cross-checked with existing heuristics to eliminate the overlapping heuristics. The new heuristics were then implemented in the health information system, and validation was performed through heuristic evaluation by expert reviewers.
Another study by Bouraghi et al. [49] used Nielsen's existing heuristics to evaluate the health information system. They initially performed an expert evaluation of the system and identified usability errors, which were then categorized in severity. Validation was not performed because the study did not propose any heuristics.
Health information systems share several commonalities with intelligent agents, such as data processing, decision support, personalization, communication and interaction, integration with other systems, security, and privacy. While there are commonalities between intelligent agents and health information systems, it is important to note that they also have distinct features and purposes. Intelligent agents focus on intelligent decisionmaking and user interaction, while health information systems are specifically designed to manage and exchange healthcare-related data and support clinical workflows. Therefore, usability heuristics development methods for both domains could be similar, or general usability principles may overlap, but specific considerations and priorities reflect the unique characteristics and requirements of their respective domains and interaction modalities.

Online Websites
A systematic methodology for the development of usability heuristics by Quiñones and Rusu [25] has been adopted by several researchers to develop domain-specific usability heuristics, and Saavedra et al. [63] developed usability heuristics for social networks using a similar methodology. A domain-specific literature review was conducted, and the eight stages for systematic development of heuristics were performed. The developed heuristics were evaluated by experts, the results were analyzed, and future work was presented. However, Huang [80] performed a case study in which the author used mixed processes to evaluate tourism websites. A user-centered approach to empirically assess the website was undertaken, which involved conducting a literature review, a selection of websites, a selection of tasks to be performed by participants, development of a usability questionnaire for assessment, and data collection and analysis, and a discussion with results was conducted. In this work, no validation process was performed because the heuristics were generated after the literature review and experiment.
Similarly, Krawiec and Dudycz [48] used Nielsen's heuristics to evaluate a website and identify usability errors. After identification, errors were categorized, and new categories were added to Nielsen's heuristics. Experts evaluated the proposed heuristics, and results showed that new categories were able helpful in detecting more usability errors. A mixed development method was adopted by Zardari et al. [70] to develop usability heuristics for an e-learning website. The study included heuristic evaluation, usability testing, a user experience questionnaire, and eye tracking to identify usability errors. Heuristics were developed and used to identify errors. The designers and developers fixed the identified errors, and after fixing the errors, another usability test was completed to validate the proposed heuristics. A questionnaire survey was conducted to gather qualitative feedback.
Online websites and intelligent agents also share some commonalities, such as communication and interaction with diverse users, information access, personalization, automation, and integration. While there are similarities, it is also important to note the distinct characteristics and purposes. Online websites primarily serve as platforms for presenting information and conducting online transactions, while intelligent agents focus on conversational interactions, task automation, and providing personalized assistance. Therefore, usability heuristics development methods also require a focus on different priorities and considerations due to key differences in interaction modalities, information presentation, and user inputs.

Virtual Learning Environments
Usability heuristics to evaluate virtual labs were proposed by Kumar et al. [53] after conducting a literature review. These heuristics were a combination of Nielsen's and Nokelainen's heuristics. The researchers performed a literature review and selected heuristics relevant to virtual labs. The proposed heuristics were then evaluated using available virtual lab platforms, and an analysis was made. The results showed that proposed usability heuristics were more helpful in detecting usability issues. The proposed heuristics were not validated because they were developed after a literature review. A similar approach was adopted by Vieira et al. [64] to develop usability heuristics for evaluating the usability of educational games. The development method used in this research was based on a literature review. It was conducted in four phases: identification of articles to be selected for review, triage-study selection and exclusion, articles included after eligibility, and an analysis was conducted. The results showed that proposed heuristics could not be validated, as the field of knowledge was still beginning to develop.
Using Nielsen's heuristics, systematic usability heuristics were developed by Figueroa et al. [51]. This study also adopted the development method proposed by Quiñones and Rusu [25] and validated the heuristics through experiments, heuristic evaluation, case studies, and user tests. A post-pandemic study was conducted by Ismail et al. [69] to evaluate online learning environments such as Zoom and Teams. This study was initiated by conducting a literature review and selecting usability heuristics relevant to online learning environments. The proposed heuristics were used to evaluate the two most widely used platforms, and an analysis was conducted. No validation process of heuristics was carried out because heuristics were developed from the literature review.
The virtual learning environment and intelligent agents have similarities such as personalized learning, adaptive learning, providing in-time support and assistance, automation, and feedback and assessment. Despite similarities, virtual learning environments focus on providing a digital platform for instructional content delivery, interaction, and assessment. Intelligent agents, on the other hand, emphasize personalized assistance, automation, and conversational interactions for enhanced learning experiences. General usability principles may overlap intelligent agents, but specific considerations and priorities reflect each platform's unique characteristics and objectives.

Mobile Applications
In the last half-decade, massive mobile application growth has been observed across different fields and domains. Similarly, the human-computer interaction field has been trying to catch up with the fast-paced development in the mobile industry. Several usability heuristics and evaluation methods have been proposed and implemented by academicians and industry experts to enhance the usage of mobile phones, according to Hasan et al. [88].
A literature review conducted by Da Costa et al. [62] for the quality assessment of mobile phones proposed usability heuristics. These heuristics could not be validated, and the author intended to use them in empirical validation, allowing dynamic incorporation and improvements. Similarly, a literature review by Salah et al. [68] proposed usability heuristics for evaluating m-commerce applications in Arabic. This study also did not validate the proposed usability heuristics because the heuristics were developed after a literature review, and the authors planned to assess different interfaces in Arabic. Another study by Kumar et al. [76] developed usability heuristics for mobile learning applications using a mixed-process approach. They used Nielsen's heuristics and a literature review to develop specific heuristics, which were then used to categorize usability problems. An expert evaluation was performed to validate the proposed heuristics.
The only study that uses existing heuristics, except Nielsen's, was conducted by Sancho Nascimento et al. [59] to develop usability heuristics for mobile game applications for children with Down syndrome. This study used existing heuristics proposed by Breyer evaluation, Able Games Association, and Recommendations of Preece, Sharp, and Rogers to develop the game. The game was evaluated by usability experts and endorsed by a walkthrough with health professionals. Another usability heuristic was developed for mobile games by Robson and Sabahat [84]. They identified usability problems in existing game applications and analyzed the gathered data. After data analysis, usability heuristics were developed and implemented by creating a game prototype. The implemented usability heuristics were validated by expert gamers evaluating the game prototype.
A literature review was conducted by Abreu et al. [65] to evaluate the usability of children's education applications. Usability heuristics were developed by reviewing the literature, and expert evaluators and an experiment evaluated the developed heuristics. Expert evaluators included teachers, researchers in child education, specialists in HCI, and researchers in computing in education. Another study by Limtrairut et al. [56] proposed heuristics for an m-learning application and developed a prototype using existing heuristics. The proposed heuristics were validated by seven experts evaluating the prototype.
After conducting a literature review and questionnaire survey from users, design recommendations for mobile stock exchange applications were presented by Hussain et al. [58]. Expert evaluators evaluated the developed heuristics by reviewing the existing applications and identifying problems. The analysis was conducted, and design recommendations were presented as results. Similarly, usability heuristics to evaluate financial technologies were developed by Ali et al. [54] based on Nielsen's heuristics. Bashir et al. [55] presented usability heuristics for fitness-related, context-aware mobile applications based on existing usability heuristics. The researchers reviewed the literature, developed domain-specific heuristics, and evaluated existing applications. After the identification of the problem, the heuristics were refined and validated. The validation was performed through two evaluation studies and one usability expert review. A similar approach using Nielsen's heuristics was conducted by Faria Gomes et al. [50], but this study focused on IOS applications. The study was fairly similar in terms of methodology. However, the validation was performed by conducting a system usability scale (SUS) questionnaire.
For evaluating children's education relating to mobile applications, Samarakoon et al. [79] experimented with preschool children and observed while children interacted with the tablet's interface. The problems were identified, and new heuristics were developed based on the observations. The developed heuristics were implemented in the interface, and another experiment was conducted for validation. Similarly, Eltalhi et al. [82] evaluated children's education application in three steps: pre-test, post-test, and usability test. However, no validation was performed because this study did not propose usability heuristics.
One study validated usability heuristics during the design phase through user testing, including a mix of surveys, concurrent think-aloud, and interviews for feedback on the prototype. This study was conducted by Kim et al. [83] and evaluated a disease app. The methodology used to generate usability heuristics was based on usability problems gathered by creating personas, conducting competitor analysis, heuristic evaluation, and user interviews. Another study used universal design theory to propose usability heuristics. After conducting a literature review, heuristics were developed by the researcher. The developed heuristics were evaluated and validated by mixed methods, including usability testing and user-experience evaluation, and the designers addressed the results.
Usability heuristics for mobile applications and intelligent agents also share commonalities such as user-centered design, interaction design, error prevention and recovery, and consistency and familiarity. However, there are also some differences in terms of interaction modalities, context and portability, presentation and content, and multimodal capabilities. Similarly, both differ in unique characteristics and specific considerations.

Intelligent Agents
Usability heuristics for speech-based smart devices and intelligent agents differ based on their specific characteristics and intended uses. To evaluate the speech-based smart devices, a literature review was conducted on existing heuristics, and domain-specific heuristics were developed by Wei and Landay [57]. After the development of heuristics, an evaluation of speech-based smart devices was carried out. Expert evaluators validated proposed heuristics. Speech-based smart devices differ in terms of physical interaction. Usability heuristics for speech-based smart devices consider physical aspects of interaction, such as wake word detection, microphone sensitivity, and voice recognition accuracy.
Similarly, usability heuristics for voice user interfaces also differ from intelligent agents in terms of task-oriented interactions. To evaluate Voice User Interfaces (VUIs), a usability study was conducted using System Usability Scale (SUS), Post-Study System Usability Questionnaire (PSSUQ), heuristic questionnaires, and interviews by Pyae [85]. The data gathered from the usability study was analyzed, and usability heuristics were formulated from the analysis; therefore, no validation of heuristics was required. The usability heuristics for voice user interface focus on designing interactions for specific tasks or use cases, such as clear and concise prompts, appropriate dialog flow, and accurate recognition of user commands.
Usability heuristics to evaluate chatbots developed by Sánchez-Adame et al. [78]. In their study, the researchers conducted a literature review and developed usability heuristics based on their experience developing related applications and systems The researchers adopted a modified version of the Quiñones and Rusu [27], findings for this study and performed 6 stages of systematic heuristic development. Chatbots are also considered intelligent user interfaces, however, they differ from intelligent agents in terms of conversational flow. Usability heuristics for chatbots prioritize natural and coherent conversations. Addressing factors such as contextual understanding, maintaining conversational context, and generating appropriate responses based on user inputs.
While there may be some overlaps in general usability principles, the specific considerations and priorities in usability heuristics for speech-based smart devices, voice user interfaces, chatbots, and intelligent agents reflect the unique characteristics and objectives of each system or platform. Speech-based smart devices focus on physical and audio-related aspects, voice user interfaces emphasize task-oriented interactions, chatbots prioritize conversational flow, and intelligent agents aim for adaptive and personalized experiences.

Other Domains
Usability heuristics to evaluate information architecture framework for academic library websites are proposed by Silvis et al. [67]. The methodology used for heuristic development is based on a literature review. An analysis was conducted, proposed heuristics were implemented in websites, and recommendations and reviews were provided, but no heuristics were validated.
A tool to evaluate the usability of games using Nielsen's heuristics was used by Yanez-Gomez et al. [74], and usability problems were identified. A domain-specific heuristics were developed based on problems identified and implemented into the games. After developing heuristics were implemented, a preliminary evaluation was conducted on two games. Analysis suggested that the proposed heuristics successfully identified usability problems in the games.
Usability heuristics to evaluate the interface for Arabic m-commerce applications were developed by Salah et al. [68], which were also used to evaluate the system interface in the Arabic language by Muhanna et al. [60]. These heuristics were developed by conducting a systematic literature review, and usability experts validated the proposed heuristics. The developed heuristics successfully detected usability issues and violations in interfaces of the Arabic language.
A literature review on Nielsen's heuristics and user tests was conducted to develop usability heuristics for evaluating systems with tabletop interfaces by de Franceschi et al. [75]. This study adopted a modified version of Quiñones and Rusu [25], and a prototype was developed with proposed heuristics. The proposed heuristics were validated by a case study where multiple users used the prototype.
A set of usability heuristics developed by Umar et al. [52] focused on enhancing the usability of systems used for children's education, also known as child computer interaction (CCI), using Nielsen's heuristics. This study was carried out by conducting a use-case study with children and identifying the usability issues. Nielsen's heuristics were modified and evaluated by experts. The finalized heuristics were implemented in a prototype and another round of user testing validated the heuristics. The prototype developed with the proposed heuristics was more usable for children's education.
For evaluating interactive web maps, Marquez et al. [66] used the eight stages proposed by Quiñones and Rusu [25] to develop usability heuristics systematically. The eight stages include exploratory study, experimentation, descriptive, correlation, selection, specification, validation, and refinement. The developed heuristics were validated by experts performing the heuristic evaluation.
To evaluate set-top box and television interfaces, Kaya et al. [73] performed a mixed process involving problems identified by the developers, three experts with cognitive walkthroughs, and customer complaints to develop usability heuristics. Based on the gathered data, the researchers developed usability heuristics, and experts evaluated those heuristics by creating clusters of problems previously identified. A validation checklist was created based on cluster analysis. The proposed heuristics were validated by user testing, expert judgment, and heuristic evaluation.
Viana et al. [81] applied usability heuristics in a machine-learning system for data labeling. A preliminary study was conducted to identify the usability problems in an existing system. The gathered data were analyzed and compared to Nielsen's heuristics. The finalized heuristics were used to develop the labeling system.
Existing heuristics could not be used for evaluating augmented reality (AR) or mixed reality (MR); therefore, Derby et al. [61] conducted a literature review to develop usability heuristics for these systems. For the heuristic development, the eight stages proposed by Quiñones and Rusu [25] were used, and expert reviews, heuristic evaluation, and user testing of the systems performed the validation. A similar approach of eight-stage heuristic development for evaluating progressive web applications was also adopted by Anuar et al. [72]; however, the validation was performed by five experts from academia and industry on three different domain applications: cultural heritage, stock photo industry, and marketplace.
Usability heuristics for evaluating a hospital-based computerized decision support system (CDSS) developed by Marcilly et al. [77] using a mixed approach, where the heuristic evaluation of the existing system was performed. Concurrently, the researchers conducted questionnaires and interviews with hospital staff. Cross-checking of collected data was performed, and after a comprehensive analysis, the heuristics were validated during the design phase through user testing.

Discussion
The field of intelligent task automation systems and intelligent agents has rapidly grown over the past few years. While most of these systems are currently desktop-based, recent advancements in mobile phone devices have led to the development of mobile-based task automation systems. However, despite these advancements, limitations still need to be addressed.
This review study concludes that one of the major limitations or causes of the inadaptability of such mobile-based task automation systems and applications is the unavailability of domain-specific usability heuristics for developers and designers to develop easy-to-use and user-friendly systems. Even with high-speed processors and RAMs, these systems can still present usability issues. Only if a user understands how a certain function works or why certain functions exist will they be able to use the device to its full potential. Otherwise, many of the system functionalities will not be known to the users due to the bad design of the interface. Additionally, even if users understand how to perform certain tasks and activities, they can only sometimes be sure that the system or application will perform as expected.
This study also suggests that the human-computer interaction community needs to give more attention to developing systematic domain-specific usability heuristics, such as for task automation intelligent agents, because these systems have the potential to make human life easier. To effectively utilize this potential, usability is an essential aspect to consider during the design and development of such systems and applications. In addition to usability issues, there are other limitations to mobile-based task automation systems and applications; for example, they may need help to handle large amounts of data or complex tasks as effectively as desktop-based systems. Additionally, they may need more battery life and storage capacity, which can limit their usefulness for certain tasks; this can be considered future work.
Despite these limitations, there is a significant amount of development in the field of mobile-based task automation systems and applications. However, as discussed in research question R.Q.2, most proposed or developed usability heuristics have been focused on domains other than task automation intelligent agents. This raises the question of why there is a significant amount of development in one area, while the human-computer interaction community is exploring other fields. While it is valuable to explore other fields, it is essential to consider the need for rapid advancements and the demands of daily life. Mobile technology has become an integral part of people's lives, and mobile-based task automation systems or applications have the potential to provide numerous benefits to users. Therefore, R.Q.2 concludes that it is essential to undertake systematic efforts to support the development of mobile technology and mobile-based task automation systems and applications.
The author also suggests some approaches to overcome the limitations and concludes from this review study that one potential approach to address these limitations is to design mobile-based task automation systems and applications with a user-centered approach. This approach involves involving users in the design process to ensure that their needs and preferences are taken into account. Additionally, usability testing can be conducted to identify potential issues and make necessary improvements before releasing the product. Another approach is to incorporate more machine learning and other artificial intelligence techniques into mobile-based task automation systems. These techniques can help improve the efficiency and effectiveness of these systems and applications, making them more useful for a wider range of daily life tasks. While there are limitations to mobile-based task automation systems, they have the potential to provide numerous benefits to users. To realize this potential, the human-computer interaction community needs to give attention to usability issues and undertake systematic efforts to support the development of mobile technology. By doing so, we can create mobile-based task automation systems that are efficient, effective, and user-friendly.
The development of usability heuristicsfor intelligent agents that automate tasks is essential due to the unique nature of these agents, which necessitates human interaction. This interaction should feel natural and intuitive, with the agent being capable of accurately and reliably understanding and responding to user inputs. It is also important to ensure that the agents are user-friendly and easily accessible. By creating domain-specific usability heuristics for task automation intelligent agents, we can guarantee that these agents meet the requirements and offer guidelines for designing agents that are comprehensible and straightforward to use. Ultimately, the development of usability heuristics for task automation intelligent agents is vital and indispensable for providing a positive and efficient human-computer interaction experience.

Conclusions
In conclusion, this study shows the availability and potential of a wide range of research work that could be carried out in this domain. This study also confirms the need for usability heuristics to be developed in the future to develop usable task automation intelligent agents effectively and efficiently. Developing usability heuristics for task automation intelligent agents is a vital aspect to consider while creating effective human-computer interaction experiences. These intelligent agents are designed to interact with humans, and thus, it is crucial that the interaction must be natural and intuitive. To achieve this, it is essential to develop usability heuristics that can guide the design process of these agents, ensuring that they are user-friendly, easy to use, and accurately respond to user inputs. These agents can be designed for a specific domain, such as healthcare, finance, or customer services, where the requirements for user interaction might vary, and domain-specific usability heuristics can be developed to ensure that the agents meet the necessary criteria.
The study also shows the development of usability heuristics for task automation intelligent agents and systems which considers the intelligence and automation aspects of the devices interacting with the users multimodally, e.g., voice, gestures, contextually aware techniques, etc. The usability heuristics for task automation intelligent agents should aim to provide guidelines for creating agents that are easy to learn and use, with a minimal cognitive load on the user. The heuristics should focus on aspects such as the visibility of system status, which ensures that the user is aware of the agent's current state and that the feedback provided to the user is relevant and timely. Additionally, the heuristics should aim to reduce the need for the user to remember complex commands or procedures, and instead, provide recognition-based interactions that allow users to easily recognize the desired action.
Overall, developing usability heuristics for task automation intelligent agents is crucial to ensure that these agents meet the user's needs, are easy to use, and provide a positive user experience. The heuristics can guide the design process and help create agents that accurately respond to user inputs while reducing cognitive load and ensuring user satisfaction.  Data Availability Statement: Not applicable.

Conflicts of Interest:
The authors declare that they have no conflict of interest/competing interests.