Empowering High-Level Spinal Cord Injury Patients in Daily Tasks With a Hybrid Gaze and FEMG-Controlled Assistive Robotic System

Individuals with high-level spinal cord injuries often face significant challenges in performing essential daily tasks due to their motor impairments. Consequently, the development of reliable, hands-free human-computer interfaces (HCI) for assistive devices is vital for enhancing their quality of life. However, existing methods, including eye-tracking and facial electromyogram (FEMG) control, have demonstrated limitations in stability and efficiency. To address these shortcomings, this paper presents an innovative hybrid control system that seamlessly integrates gaze and FEMG signals. When deployed as a hybrid HCI, this system has been successfully used to assist individuals with high-level spinal cord injuries in performing activities of daily living (ADLs), including tasks like eating, pouring water, and pick-and-place. Importantly, our experimental results confirm that our hybrid control method expedites the performance in pick-place tasks, achieving an average completion time of 34.3 s, which denotes a 28.8% and 21.8% improvement over pure gaze-based control and pure FEMG-based control, respectively. With practice, participants experienced up to a 44% efficiency improvement using the hybrid control method. This state-of-the-art system offers a highly precise and reliable intention interface, suitable for daily use by individuals with high-level spinal cord injuries, ultimately enhancing their quality of life and independence.


I. INTRODUCTION
I NDIVIDUALS with high-level spinal cord injuries fre- quently encounter considerable difficulties in carrying out essential tasks such as grasping and feeding due to motor impairments caused by their condition [1].These tasks, critical for maintaining independence and quality of life, often become challenging, necessitating the development of innovative technologies that can assist in their execution.One promising area of research lies in the development of effective hands-free human-computer interfaces (HCI) [2].These interfaces, when integrated with assistive devices, have the potential to greatly facilitate activities of daily living (ADLs) for those affected by high-level spinal cord injuries [3].
In the field of hands-free HCI, accurately recognizing user intent poses a significant challenge [4], [5].With mechanical devices replacing limbs, it becomes crucial to establish new pathways for intention transmission from the brain to external devices, ultimately restoring their functionality [6].In upper limb prosthetic control, employing residual muscles is an intuitive approach [7].However, for individuals with high-level disabilities, such as those with C5-C8 spinal cord injuries, this method proves to be infeasible due to the near-complete loss of upper limb function.Consequently, researchers have developed multi-degree-of-freedom assistive robotic arm (ARA) tailored for this specific population, focusing on leveraging effective neural signal sources from electromyogram (EMG) [8], head movements [9], eye tracking [10], and electroencephalography (EEG) data [11].
Brain-computer interfaces (BCIs) offer a unique capability to detect user intent without the involvement of muscle or limb activity, presenting a promising solution for individuals experiencing high-level spinal cord injuries [12].Consequently, EEG-based BCIs have garnered interest as an alternative hands-free approach to controlling assistive devices [13].These BCIs translate brain signals into computer commands, enabling users to manipulate devices through thought alone [14].Despite the potential advantages, EEG-based BCIs face several challenges, including low signal-to-noise ratios, lengthy training periods, and the requirement for regular calibration [15].Head movement tracking systems use cameras or sensors to translate head movements into computer commands [16].These systems can be efficient for some users; however, they require sufficient neck muscle control, which may not be possible for individuals with high-level spinal cord injuries.Voice recognition technology enables users to control computers and digital devices using vocal commands [17].Although voice recognition is an effective hands-free method, it is susceptible to interference from background noise and its discrete control commands, as well as slower response times, make it difficult to handle continuous and real-time tasks.The aforementioned HCI methods do not have a direct relationship with the target's position.Instead, they rely on decoding intention commands and gradually approach the target through visual feedback [18], [19].This type of approach may result in less user-friendly and inefficient human-machine interactions [20].To enhance the overall user experience and efficiency of hands-free interfaces, incorporating more intuitive and direct target localization methods would be advantageous.
Gaze points offer an unparalleled advantage, as they can capture real-time spatial coordinates of targeted objects [21].This characteristic facilitates the simultaneous representation of user intent and object positioning, setting gaze points apart from other biological signals and creating a more immediate and intuitive method for HCI [22].Maimon-Dror et al. achieved continuous 3D endpoint control of a robotic arm support system by combining 3D gaze tracking using a binocular eye tracker [23].McMullen et al. developed a semi-autonomous hybrid brain-machine interface that combines human intracranial EEG, eye tracking, and computer vision to control a robotic upper limb prosthetic for grasping tasks [24].Li et al. [25] introduced a novel gaze vector method to accurately estimate the three-dimensional coordinates of observed objects in a real-world environment.Participants successfully controlled a robotic arm to grasp objects by directly gazing at the target object.
Nonetheless, extracting intentional eye movement signals remains a challenging process as eye movements are continuous occurring biological signals [26], [27].While previous research has demonstrated that EEG and EMG signals show consistent relationships with eye movements [28], [29], and factors such as pupil diameter and fixation can indicate users' implicit intentions [30], [31], the overall accuracy of intention recognition is still limited.This limitation hinders the development of efficient and accurate applications for individuals with severe disabilities.
Hence, there is an urgent need for a reliable eye-tracking interface that can accurately identify object locations and user intentions for individuals with high-level spinal cord injuries.To address this issue, this paper developed an intuitive hybrid control method that combines gaze and facial EMG (FEMG) to assist people with high-level spinal cord injuries in carrying out ADLs.Real-time gaze point is filtered to generate stable gaze points, which are then converted into viable grasping points for target planning and grabbing using a unified coordinate system.Furthermore, a lightweight convolutional neural network is employed to recognize four voluntary FEMG patterns to confirm the user's intended gaze point effectively.This groundbreaking system offers a highly precise and dependable intention interface, making it appropriate for daily use by individuals with severe disabilities.
The main contributions of this paper can be summarized as follows: 1) An intuitive hybrid control system has been developed, combining gaze and FEMG to assist individuals with high-level spinal cord injuries in carrying out daily grasping and feeding tasks.
2) Implementation of a real-time decoding method for real-world target coordinates using a unified coordinate system, coupled with a lightweight CNN to accurately confirm the user's intended gaze point.
3) The performance of the hybrid method is compared with individual control methods (FEMG-based control and gaze-based control).The results demonstrate that the hybrid approach significantly improves task performance in picking and placement for the target population.
The remainder of this article is organized as follows: Section II details the methods employed in this study, while Section III presents the corresponding experimental results.In Section IV, the research findings are discussed, and future research prospects are proposed.Finally, Section V offers a conclusion to the paper.

A. Hybrid Assistive Robotic System
In this research, the assistive robot system integrates a JACO arm-a lightweight, six-degree-of-freedom ARA from Kinova Robotics, Canada-into the human-machine interface (HCI), as shown in Fig. 1.The HCI, based on gaze and FEMG, employs an hybrid control strategy, allowing individuals with high-level spinal cord injuries to use human gaze to locate target positions and facial muscle activity to ascertain their active intentions.This process aligns FEMG control commands with gaze data, equating to the extraction of coordinates of intentional gaze.To enhance the stability of the gaze point while also maintaining immediacy in the current gaze intent, an exponential weighted average is adopted to address issues like instability of the gaze fixation point caused by factors such as saccades.Then, the gaze point in the user's coordinate system (eye-tracker coordinate system) is transformed into the world coordinate system (ARA coordinate system) through a unified coordinate system for target planning and grabbing by the ARA.A QR code, strategically placed next to the wheelchair, is employed to bridge the eye-tracking coordinate system and the ARA coordinate system, effectively integrating them within a unified coordinate system.The FEMG pattern reflects the user's active intention to express control instructions for gaze and tasks.Here, a lightweight CNN is utilized to identify four kinds of active intentions to ensure accuracy and real-time performance.
Development of a unified coordinate system is also highlighted in this work.Given that the eye-tracker captures object positions relative to user coordinate system, these Fig. 1.The hybrid hands-free HCI designed to assist individuals with high-level spinal cord injuries.The assistive robotic system comprises an ARA, a wheelchair, and a hybrid HCI that combines gaze tracking and FEMG.The eye tracker captures the user's 3D gaze, and the gaze point in the user's coordinate system is transformed into the world coordinate system using a unified coordinate system for target planning and grasping.The Delsys EMG device detects FEMG patterns that reflect the user's active intention to express four control instructions:confirming gaze point (CG), approaching the object (AO), grasping the object (GO), and releasing the object (RO).A lightweight CNN is employed to identify four types of active intentions, ensuring both accuracy and real-time performance.The 3D position of target objects and users' intended actions can be decoded and seamlessly integrated to control assistive robotic systems.The images used in this study have been obtained with written consent from the subjects, allowing the display of their identifiable features.
coordinates cannot be directly utilized as execution coordinates for the ARA.This necessitates the establishment of a unified coordinate system encompassing the human-robotenvironment interface.In response to this, a QR code has been affixed adjacent to the ARA.Prior calibration of this code with the transformation matrix of the ARA is required.Such an arrangement proves advantageous for individuals with high-level spinal cord injuries as it forms an integrated system with the ARA and wheelchair, thereby obviating the need for external sensors in the environment or on a table to establish a coordinate system.This approach significantly enhances the independence of the human-environment-robot system and alleviates environmental constraints, thereby facilitating individuals with high-level spinal cord injuries in performing ADLs across diverse scenarios.

B. Human Gaze Tracking 1) Gaze Point Extraction and Transformation:
The eyetracking system employs the Tobii Pro Glasses 2 wearable eye-tracker, capturing the user's three-dimensional gaze point coordinates.Calibration is conducted for each user prior to data acquisition, which enhances the precision of gaze point recognition.The eye-tracker operates at a sampling rate of 40 Hz, rapidly capturing a set of gaze point coordinates within 0.025 seconds, thereby satisfying the real-time requirements of the system.The unified coordinate system enables the transformation of user coordinates into those of the ARA.This methodology achieves a unified coordinate system that seamlessly integrates the ARA, user, and environment.This unified coordinate system comprises three coordinate systems: the eye-tracker coordinate system O user (user coordinate system), QR code coordinate system O Q R (linking coordinate system), and ARA coordinate system O A R A (robot coordinate system).As the eye-tracking and ARA coordinate systems operate independently, a unified coordinate system is established using a linking coordinate system.The formulas are shown below: where the T A R A Pose represents the target position of the ARA.The T A R A Qr represents the coordinate transformation between the ARA and the QR code, and it is a fixed matrix within the system.T Qr Pose links the user coordinate system and the ARA coordinate system, obtained by the following formula: where the T user Pose represents the real-time position of the target object within the user coordinate system, captured by an eye Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE I THE FIVE COMMANDS USED TO CONTROL ARA
IN GAZE-BASED CONTROL tracker.The system decodes the T Qr user by analyzing the QR codes in images obtained from the eye tracker.Despite the built-in calibration of the eye-tracking device, observational data tends to deviate persistently in a fixed direction due to the device's systematic errors and individual variations in users' eyes.To enhance the accuracy of gaze points, an error matrix, T δ , is introduced for correction.This matrix remains constant for each user and is determined based on specific tests conducted after calibration.The ARA unified coordinate system proposed is designed to enhance the independence of individuals with high-level spinal cord injuries.
2) Gaze-Based Control: This study also compares the hybrid gaze-FEMG control method with a pure gaze-based control method.The gaze-based control encompasses five types of commands, as shown in Table I.Looking Upwards indicates 'Upward Displacement' (away from the object); Looking Downwards suggests 'Downward Displacement' (toward the object); Looking Left commands 'Gripper Closure'; Looking Right denotes 'Gripper Release'; and a Sustained Gaze (3 s) signifies 'Reach'.These commands and intentions are essential for daily tasks and are controlled solely by gaze signals.Prior to the experiment, the 3-second gaze variance (which measures the stability of the gaze fixation) and trigger thresholds for each direction are customized for each user, followed by measurement and calibration.

C. FEMG Recognition
In this section, recognition of users' active intentions through FEMG is introduced.Initially, collection of four FEMG datasets necessary for hybrid control takes place.This is followed by training and storing a CNN model.During the online recognition phase, the trained model is loaded to recognize real-time FEMG activity.
1) Data Acquisition and Preprocessing: In hybrid control, the intention originates from voluntary facial muscle activities.
To ensure high-level spinal cord injury patient performance across multiple scenarios and enhance independent living, FEMG data is acquired using the wireless Delsys EMG recognition system from the United States.For FEMG recognition, five wireless EMG devices are placed on the user's left masseter, right masseter, left platysma, right platysma, and frontalis muscle.These five channels cover a range of facial muscle activities, including tilt mouth to left, tilt mouth to right, teeth clenching, and eyebrow raising.These activities correspond to the four commands in hybrid control: confirming gaze point (CG), approaching the object (AO), grasping the object (GO), and releasing the object (RO), as illustrated in Fig. 1.
2) Convolutional Neural Network Architecture: A lightweight CNN was designed specifically to infer users' FEMG intentions.First, FEMG signals are segmented using a smooth sliding window with a length of 50, resulting in 50 × 5 (temporal × channel) two-dimensional EMG samples.Given CNN's proficiency in recognizing two-dimensional matrix images, a lightweight network structure is designed to ensure real-time control.The CNN framework comprises three convolutional layers and two fully connected layers, as shown in Fig. 2. Each convolutional layer employs 32, 64, and 128 filters of size 3×1, respectively, maintaining uniformity in filter dimensions across layers.The subsequent max pooling layers, with window sizes of 5 × 1 and 2 × 1 respectively, facilitate dimensionality reduction and feature consolidation.The network concludes with two fully connected layers, which transform the multidimensional feature map into an 5dimensional output (encompassing four directive FEMG and one neutral FEMG), appropriate for classification tasks.This design effectively captures and processes signal characteristics, ensuring efficient feature extraction and representation for precise classification purposes.
3) FEMG-Based Control: The method of hybrid gaze-FEMG control is experimentally compared with pure FEMG-based control.With the absence of gaze for positioning and reliance solely on FEMG for ARA control, command over six directions and the gripper is required.As a result, seven facial muscle maps are required, including tilt mouth to left, tilt mouth to right, left cheek blow, right cheek blow, cheek stretch, eyebrow raise, and teeth clenching.These correspond to the ARA's up and down, front and back, left and right movements, and grasping and releasing, as depicted in Fig. 3.The four intention commands of hybrid control and the seven commands of purely FEMG-based control require different facial muscle maps and training of distinct neural network models, all of which are performed in the offline collection process.

D. Experimental Protocol
Engaged in this study were one patient with high-level spinal cord injuries (S1, male, 25 years old) and three healthy Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.participants (S2-S4, two males, one female, aged 22.9 ± 1.2 years).All participants were right-handed, with S1 having been right-handed prior to paralysis.This resulted in the ARA being affixed to the right side of the wheelchair, adjacent to the user's right hand.Each participant was capable of comprehending instructions and providing informed consent.Participant S1 had sustained a cervical spine fracture-induced spinal cord injury, leading to severe limb mobility impairment and limited neck movement, but normal facial muscle function remained intact.All participants possessed normal or corrected-to-normal vision.Prior to any experimentation, written informed consent was obtained from all participants.The experimental protocol was approved and conducted under the supervision of the Sustech Medical Ethics Committee (approval number: 20210009, date: 2021/3/2).
The presented hybrid control method proffers a solution for daily tasks to individuals with high-level spinal cord injuries.This study demonstrates the successful execution of eating and pouring tasks, which are illustrated in Fig. 4. Through the hybrid interface, users swiftly locate targets and efficiently manipulate the ARA.These tasks encapsulate the challenges that individuals with high-level spinal cord injuries urgently need to surmount for independent daily living.Our hybrid control approach provides robust and efficient assistance in task completion.Detailed performances of these tasks can be observed in the accompanying video (Supplementary Video).
In the experimental section of this paper, the evaluation and comparison of the proposed methods are focused on the classical pick-and-place task, a widely recognized experiment for assessing HCI systems.Participants were required to pick up a designated object and place it at a specified location, as illustrated in Fig. 5. Prior to the formal experiments, participants engaged in approximately five trial runs, each lasting around 10 minutes, to familiarize themselves with the system.During the official experiments, consistency was maintained in the initial position of the ARA, and the locations for object pickup and placement.The sequence of objects manipulated across the three control method experiments was identical.Each participant repeated the pick-and-place task six times, providing reliable data on the time required for each stage of the task, including reaching, grasping, transporting, and placing the object.

E. Training and Evaluation
Before commencing the task, it is essential to conduct offline FEMG recognition and gaze point accuracy assessments to ensure precise position decoding and intention estimation.The evaluation of FEMG recognition involves separate testing for the four control commands under hybrid control and the seven intention commands under pure FEMG-based control.Both control strategies employ similar methodologies for data collection, preprocessing, and model training, although they differ slightly in the number of commands recognized.For each user, 20 seconds of FEMG data for each facial expression are collected to train the model, with the datasets split into 75% for training and 25% for testing.Classification accuracy is calculated by comparing the predicted commands with the actual intentions of the users.
The accuracy measurement of gaze points in this study is divided into two parts: within the user coordinate system and the ARA coordinate system, with each measurement repeated six times for reliability.In the user coordinate system, a calibration board with marked coordinates is placed 0.5 meters directly in front of the user, a common distance for high-level spinal cord injury patients when manipulating objects on a desktop.Users typically look directly at objects during manipulation.The system calculates the deviation between the actual gaze point and the gaze point identified by the eye tracker, determining the error matrix T δ and assessing its accuracy.Measurements begin with the initial uncalibrated accuracy followed by personalized calibration settings using an in-built algorithm to enhance the post-calibration accuracy.5.The proposed method was evaluated and compared using a pick-and-place task.In this task, various objects were placed on a table, and participants were randomly instructed to pick up and place objects in their corresponding locations.The images used in this study have been obtained with written consent from the subjects, allowing the display of their identifiable features.
The average of six measurements is used to compute T δ .In the ARA coordinate system, accuracy is evaluated by comparing the difference between designated gaze points and the actual position of the ARA end-effector.The correction through T δ aims to improve the accuracy of eye movement control within the ARA coordinate system.
In this study, accuracy was quantified using the following formula: where T P, T N , F P, and F N denote the total sample of the true positive, true negative, false positive, and false negative, respectively.

III. RESULTS
This section reports on the performance of individual modules within the hybrid control, including the accuracy of intention recognition and precision of location decoding.Subsequently, the efficacy of the hybrid control system, integrating FEMG and gaze, in task completion is presented, and compared to purely FEMG-based and gaze-based control methods.Further comparison is drawn against existing HCI technologies across multiple dimensions.The section concludes by illustrating the progressive performance improvement of the HCI with practice, accentuating the unique advantages of the hybrid approach.

A. FEMG Recognition
FEMG intention recognition for hybrid control in S1 (high-level spinal cord injury patient) was evaluated and visualized.The results, as illustrated in Fig. 6(a), reveal the In the hybrid control mode, the four-category classification achieved an average classification accuracy of 99.2% ± 0.6% across all participants.Under the pure FEMG-based control, the classification of the seven intention commands yielded an average accuracy of 97.7% ± 1.4%.These results demonstrate high precision in intention recognition both under hybrid and pure FEMG control modes, highlighting the system's robustness and efficiency in handling diverse command types.The increase in classification types leads to a heightened recognition challenge.For instance, some confusion is observed between tilt mouth to left and right cheek blow, implying potential misrecognitions of intentions to move left or forward in real-time FEMG-based control.

B. Eye-Tracking System Accuracy and Real-Time Performance
The accuracy of gaze point tracking is progressively enhanced via calibration, and unified coordinate system transformations.This is determined by measuring gaze point errors in both the user's and the ARA's coordinate systems.The Table II demonstrates the initial error (I-error), the post-calibration error (PC-error), and the post-unified Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.transformation error (PUT-error) for each user.Results from the experiments indicate that calibration substantially ameliorates gaze point recognition error for every user, achieving an average improvement of 46.2%.However, it was observed that most users still exhibit a stable error in a certain direction post-calibration, which is challenging to rectify via calibration alone.By adjusting the bias component within the unified system, the accuracy of the gaze point recognition can be further enhanced by 12.3%.The unified system not only harmonizes the coordinates of the user and the ARA, but it also bolsters the overall system accuracy, thereby augmenting the efficiency of the grasping task.

C. Comparative Analysis With Individual Control Methods
To assess the performance differences between the hybrid control, Gaze-based control, and FEMG-based control, a oneway analysis of variance (ANOVA) was conducted to statistically highlight the significant disparities among these control methods.A Bonferroni post-hoc test was also performed to determine the significance of pairwise comparisons.The high-level spinal cord injury patient (S1) in our study had complete eye movement and FEMG signals, similar to the healthy participants, resulting in minimal variability between the two groups.In the pick-and-place task, the hybrid control method enabled users to stably complete the task (Fig. 7), with an average completion time of 34.3 s, a 28.8% improvement compared to Gaze-based control (48.2 s, P < 0.001), and a 21.8% improvement compared to FEMG-based control (43.9 s, P < 0.05).
In the experiments, it was observed that the average time required for FEMG-based control was longer, primarily due to the need for visual feedback and continuous planning during the execution of motor tasks, leading to increased time consumption.The time taken for gaze-based control was longer than that for FEMG-based control, and the standard deviation of the completion time was larger, indicating its instability.Therefore, the hybrid control method provides a more stable and efficient HCI for individuals with high-level spinal cord injuries.
A detailed report on the time required for each task stage (Reach, Grasp, Transport, Placement) using three different control methods is presented, as shown in Fig. 8.Each of these four processes was analyzed using a one-way ANOVA and Bonferroni post-hoc test.For the Reach process, the hybrid control method demonstrated significant improvements, with a 25.9% enhancement over FEMG-based control and a 35.8% improvement over Gaze-based control (p < 0.05).In the Transport process, the advantage of hybrid control was even more pronounced, showing a 37.2% increase compared to FEMG-based control (p < 0.01) and a 45.4% improvement relative to Gaze-based control (p < 0.001).However, no significant differences were observed between the three control methods during the Grasp and Placement processes (p > 0.05).These findings indicate that the hybrid control method yields greater benefits during stages requiring continuous positional control, highlighting its effectiveness in dynamically managing task execution.Table III consolidates the average results of each stage for multiple trials performed by each participant, further supporting these observations.

D. Performance Improvement With Practice
The experimental times for all three methods, including hybrid control, FEMG-based control, and Gaze-based control, gradually decrease and stabilize as users become familiar with the tasks, as depicted in Fig. 9.This trend suggests that participants can enhance their task completion performance through practice.T-tests were performed on the initial and final trial times for each of the three methods.It is observed that the performance of the hybrid control method improves with an increasing number of practice trials, reaching up to 44% efficiency improvement (p < 0.01).This improvement is attributed to the method's real-time responsiveness, intuitiveness, and accuracy, which make it easy to learn and master.However, it is noteworthy that for gaze-based control methods, despite multiple practice sessions, significant fluctuations in performance persist due to the inherent instability of oculomotor physiological signals.Integrating these findings provides  valuable insights into the learning capabilities and long-term applicability of each method.

IV. DISCUSSION
This hybrid control method, which integrates gaze and FEMG, has proven to be effective and efficient in assisting individuals with high-level spinal cord injuries in performing activities of daily living (ADLs).The high-precision positional decoding and accurate intent recognition of the system are manifested in tasks such as eating, pouring, and pick-andplace.The reduction in task completion time, compared to standalone control methods (FEMG-based control and gaze based control) and other hands-free interfaces, validates its efficiency.A comparative analysis was conducted against representative HCI systems specifically designed for individuals with high-level spinal cord injuries.The comparison criteria, as shown in Table IV, included the success rate of intent recognition or task completion (SR), type of task (TT), and task completion time (TCT).In comparison with established HCI methods, the proposed hybrid control system offers several distinct advantages.Our hybrid control method seamlessly integrates gaze and FEMG signals, yielding a control interface markedly more efficient than those relying solely on EMG, such as FEMG-based control [20], [32].Tasks requiring fine motor control, such as drawing and pouring water, are akin to pick-and-place tasks, demanding high accuracy in FEMG signal recognition.While brain-computer interfaces (BCIs) based on electroencephalography (EEG) are widely applicable to individuals with severe spinal cord injuries, the current efficiency in decoding EEG signals is still suboptimal, as noted in reference [33].Furthermore, for those with severe spinal cord injuries, everyday tasks such as eating, pouring water, and object manipulation pose significant challenges; these tasks are not only difficult but also of critical importance to the patients.Studies cited in [2] and [9], which utilize the same robotic arm with a similar workspace as our system, focus on the task of picking up everyday items.Our hybrid control method, with an average completion time of 34.3 seconds, demonstrates significantly better performance in these tasks compared to the outcomes reported in these studies.Ultimately, the strength of this hybrid control system lies in its integration of gaze and FEMG signals, thereby enhancing the stability and efficiency of task execution for individuals with high-level spinal cord injuries.
The decision to combine FEMG signals with gaze control in our system is based on two key considerations.Firstly, EMG signals precede the physical manifestation of facial expressions.This means that EMG can capture a user's intent at the very onset of a facial movement, offering a seamless and immediate confirmation of intent.In contrast, vision-based methods typically require the completion of the facial expression, leading to potential delays.Secondly, some facial patterns, like clenching teeth, can closely resemble a neutral facial EMG, leading to possible confusion in intent recognition.This similarity makes it challenging to accurately discern user intentions based solely on visual cues.Therefore, This integration not only improves the system's accuracy in Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
interpreting user intentions but also the system's responsiveness and user experience.
When used individually, both FEMG-based control and gaze-based control methods have their respective advantages and limitations.FEMG-based control leverages the benefits of voluntary facial muscle contractions, which can be detected even in individuals with severe spinal cord injuries.However, the large number of required control commands leads to a decrease in classification accuracy, reducing its reliability.Moreover, pure FEMG-based control is a process-based method that demands users maintain a high level of concentration and real-time control during task execution, increasing both the cognitive burden and time needed for control.On the other hand, gaze-based control provides users with a direct and intuitive method to indicate their targets.However, the instability of eye movements makes it challenging to accurately extract intentional gaze signals.The accuracy of intention recognition in experimental tasks is low, and users find it difficult to master.Integrating user feedback, it becomes evident that compared to the individual control modes, the hybrid control method offers a more intuitive, effortless, and accurate interactive experience.This hybrid approach combines the strengths of both FEMG and gaze controls, preferred by users for its enhanced interaction quality and reduced cognitive load.
The hybrid control system enhances the user interface by integrating gaze and FEMG signals, achieving a "what you see is what you get" intuitive control experience, which significantly increases user satisfaction in human-machine interactions.Participants reported post-experiment that the hybrid control system was the easiest to use and required the lowest learning effort, leading to significantly improved task completion times compared to those using single-modality control Additionally, the accuracy of intention recognition contributes to higher user satisfaction because it allows users to accurately interpret their intended actions and execute them effectively, resulting in a more satisfying and empowering user experience.In our future work, we plan to deepen the research in this area to continuously refine the system design and further elevate overall user satisfaction.This approach will pave the way for more refined and user-centered developments in hybrid control technologies.

V. CONCLUSION
This paper introduces an innovative hybrid control system that combines gaze and FEMG to improve the usability and efficiency of HCI for individuals with high-level spinal cord injuries.The hybrid system overcomes the limitations of previous gaze and FEMG control methods, enhancing stability and accelerating the completion of daily tasks.The research demonstrates the system's significant potential for real-world application, as it enables users to perform daily tasks, such as eating, pouring water, and pick-and-place tasks, more efficiently.The experimental findings show that the average task completion time for pick-and-place activities using our hybrid control method is 34.3 s.This represents a significant enhancement in efficiency, with a 28.8% improvement over the pure gaze-based control method and a 21.8% improvement compared to the pure FEMG-based control method.Further, the system's performance improves with use, with participants experiencing up to a 44% efficiency gain with continued practice.This innovative hybrid control system, offering a precise and reliable intention interface, holds great promise for improving the daily lives and independence of individuals with severe disabilities.

Fig. 2 .
Fig. 2. The lightweight CNN architecture is designed to recognize FEMG patterns and infer users' active intentions.It consists of three convolutional layers and two fully connected layers, with the output layer generating a distribution of class labels.

Fig. 3 .
Fig.3.The FEMG-based Control method utilizes a pattern of seven facial muscle activations, including tilt mouth to the left, tilt mouth to the right, left cheek blow, right cheek blow, cheek stretch, eyebrow raise, and teeth clenching.These activations correspond to the ARA's movements for up and down, front and back, left and right, as well as grasping and releasing.The images used in this study have been obtained with written consent from the subjects, allowing the display of their identifiable features.

Fig. 4 .
Fig. 4.This hybrid system has demonstrated successful execution of eating and pouring tasks.Through the integrated interface, users are able to swiftly pinpoint the target and effectively operate the ARA.The images used in this study have been obtained with written consent from the subjects, allowing the display of their identifiable features.

Fig.
Fig.5.The proposed method was evaluated and compared using a pick-and-place task.In this task, various objects were placed on a table, and participants were randomly instructed to pick up and place objects in their corresponding locations.The images used in this study have been obtained with written consent from the subjects, allowing the display of their identifiable features.

Fig. 6 .
Fig. 6.FEMG intention recognition for hybrid control was evaluated and visualized for S1 (a high-level spinal cord injury patient).a): The classification accuracy confusion matrix, which displays the accuracy of each intention command classification with numerical values.b): Depiction of the distinctions arising from the feature extraction process among the four types of FEMG by T-Distributed Stochastic Neighbor Embedding (T-SNE).

Fig. 8 .
Fig. 8.The detailed report on the time required for each stage (Reach, Grasp, Transport, Placement) using three different control methods in the pick-and-place task.

Fig. 9 .
Fig.9.The performance of the three control method improves with an increasing number of practice trials.The data reflects the average completion times from four subjects for every iteration of the task.

TABLE III THE
AVERAGE RESULTS OF EACH STAGE (REACH, GRASP, TRANSPORT, PLACEMENT) FOR MULTIPLE TRIALS PERFORMED BY EACH PARTICIPANT USING THREE DIFFERENT CONTROL METHODS IN THE PICK-AND-PLACE TASK

TABLE IV PERFORMANCE
METRICS FOR ALTERNATIVE HANDS-FREE HCI.SR: SUCCESS RATE OF INTENT RECOGNITION OR TASK COMPLETION;TT: TYPE OF TASK; TCT: TASK COMPLETION TIME