Semi-Autonomous Control of Prosthetic Hands Based on Multimodal Sensing, Human Grasp Demonstration and User Intention (cid:63)

Semi-autonomous control strategies for prosthetic hands provide a promising way to simplify and improve the grasping process for the user by adopting techniques usually applied in robotic grasping. Such strategies endow prosthetic hands with the ability to autonomously select and execute grasps while keeping the user in the loop to intervene at any time for triggering, accepting or rejecting decisions taken by the controller in an intuitive and easy way. In this paper, we present a semi-autonomous control strategy that allows the user to perform ﬂuent grasping of everyday objects based on a single EMG channel and a multi-modal sensor system embedded in the hand for object perception and autonomous grasp execution. We conduct a user study with 20 subjects to assess the eﬀectiveness and intuitiveness of our semi-autonomous control strategy and compare it to a conventional electromyography-based control strategy. The results show that the workload is reduced by 25 . 9 % compared to conventional electromyographic control, the physical demand is reduced by 60 % and the grasping process is accelerated by 19 . 4 %.


Introduction and Related Work
Recent advances in prosthetics and humanoid robotics have led to artificial hands with human-like appearance as well as improved dexterity and grasping abilities [1,2]. Hence, the simple yet reliable control of such hands, reducing the amount of attention a prosthesis user has to pay when performing grasping actions, becomes more and more important [3]. Therefore, simple yet reliable control strategies are needed to enable user to exploit the hand's dexterous grasping abilities to their full extent [3,4]. Such easy-to-use control aims at reducing the amount of attention a prosthesis user has to pay during the execution of a grasp. Traditionally, electrically actuated prostheses are controlled with signals captured by two EMG electrodes attached in the socket on the user's arm. Through contraction of the muscles in the forearm the user can then sequentially control all degrees of freedom of the prosthesis. While prosthetic hands get increasingly versatile, control of these added degrees of freedom is difficult with the limited expressiveness provided by the EMG interface. Hence, long control signal sequences are needed to control prostheses with a multitude of functions. Besides the need for a long training time, the direct control of more than two degrees of freedom (DoF) with only two EMG electrodes results in a high cognitive load for the user while controlling their device [5]. Therefore, a simplification of the prosthetic control strategy for the user is needed in order to reduce the user's workload while operating their device.
An active field of research is the classification of electromyography (EMG) and mechanomyography (MMG) signals from multiple sensors to improve upon the muscle activation control strategies currently applied in commercial prostheses, as for example proposed in [6,7,8,9,10,11,12]. A comprehensive survey of these techniques can be found in [13]. However, acquiring fine-granular, continuous and robust signals is challenging due to imperfect fitting of the socket and changing skin surface conditions, such as sweat and temperature [14]. Therefore, the emerging field of semi-autonomous control concentrates on reducing the amount of commands sent by the user to execute an action by incorporating environmental information extracted from additional sensor modalities and predicting the user's intention. These are especially interesting where the user's stump condition does not permit to capture feature-rich EMG signals. For this specific user group, a prosthetic control should require as little direct EMG commands as possible to mitigate the proneness to errors caused by wrong or missing muscular signal detection.
The idea of partially automating prosthetic control has a long history. For an early version of the Southampton Hand, predefined grasps are adapted based on information from a gyroscope, force sensors and slip detection supporting the user during grasping [15].
Došen et al. [16] design a semi-autonomous control scheme based on a cognitive vision system for prosthetic grasping. With a camera and a distance sensor mounted externally on the dorsal side of a prosthetic hand looking over the fingers, the object is detected and its distance can be measured. Here, the user directly controls the wrist rotation, while grasp type and hand aperture are determined based on visual and distance information and a set of if-then rules of a decision making system. While offering nine different grasp types and apertures, the system achieved an accuracy of 84 %. Grasping failures were attributed to errors of the visual object detection. The work was extended to include the wrist rotation into the semi-autonomous control scheme, leaving the user only responsible for triggering the grasping action [17].
Another approach using electrooculography and four sensors placed around the eye to determine grasp affordances was presented by Hao et al. [18]. The user has to scan the object's borders with the eyes and trigger the closing movement by EMG signals as soon as the desired preshape is obtained. While grasping in a defined setup shows an object recognition rate of 86.2 %, the robustness regarding different object-eye distances remains to be assessed.
Markovic et al. [19] present a semi-autonomous control architecture that is based on augmented reality glasses. In contrast to the approaches described above, the user stays in control of the fine-tuning of the grasp. While a first preshape is adopted based on the visual information of the stereo camera system integrated in the glasses, the user is still able to adjust the grasp aperture by a proportional myoelectric controller according to the transmitted feedback. In their following work [20], the authors use an inertial measurement unit (IMU) on the dorsal side of the palm and combine it with a stereo camera system mounted in the room as well as position and force sensors embedded into the prosthetic hand. Based on this multi-modal sensor information, the wrist rotation and grasp preshape of a prosthesis are controlled autonomously. The semi-autonomous control was compared to three manual control schemes with increasing difficulty. Compared to a manual control of grasp type, wrist orientation and finger closing, the grasp execution was faster with the semi-autonomous control.
An electrotactile human-machine interface is proposed by Gonzalez-Vargas et al. to facilitate a bidirectional communication between a prosthesis controller and the user [21]. The user intention is detected by monitoring forearm motions with an IMU mounted in the prosthetic socket. This interface presents possible grasps to the user via electro-tactile stimulation and the user then acknowledges the desired grasp choice by generating a single command signal. Although this method inherently includes delays according to the required selection process, it proves to be faster than direct proportional control of all individual degrees of freedom for three out of four preshape options.
To assess the performance of different levels of autonomy in prosthetic control regarding grasp success, subjective complexity and satisfaction, several control schemes are applied to the CyberHand [22]. The evaluation shows that less complex control schemes perform notably better in terms of perceived satisfaction, required attention and difficulty. The authors also noted that a full, individual control over all functionalities offered by the prosthesis was seldom used by the subjects. The study thereby supports the general merit of semi-autonomous control techniques.
A semi-autonomous control scheme with a multi-electrode user interface is shown on the TASKA hand [23]. Using combined force and proximity sensing at the fingertips, the finger closing motion and the grasp force applied to the object are controlled autonomously. The user controls the grasping motion similar to a pure multi-electrode manual control and the autonomous control system is activated based on a threshold set on the decoded muscle signals. This shared semi-autonomous control is shown to increase the grasp precision and decrease the workload for the user.
The use of cameras attached to a prosthetic hand has been studied in literature, resulting in several approaches for processing visual information for the application in semi-autonomous prosthetic control. A pipeline architecture is used in the cognitive vision system to derive object dimensions from the image input [16]. A number of recently published object recognition systems make use of neural networks and propose grasps based on the recognized known objects [24,25,26].
Sensor modalities used in prosthetic control range from IMU data [27] over stereo vision [20] to distance sensors [16,23]. A survey on the sensorization of both robotic and prosthetic hands can be found in [28]. In general, many semi-autonomous control algorithms rely on sensory information not directly provided by the prosthesis. Instead they require additional sensors attached to the human body or installed in the environment. Grasp types for different objects are usually designed manually. Furthermore, the distinct degrees of freedom of the prosthetic hand are generally actuated in succession starting with wrist and thumb positioning followed by the final hand closure. This leads to a slower grasp execution compared to simultaneous actuation of all degrees of freedom of the prosthesis.
In this paper, we propose a novel, semi-autonomous control scheme for grasping with prosthetic hands and evaluate its performance on an improved version of the KIT Prosthetic Hand [29]. We consider the presented semi-autonomous control, that uses hand state estimation, object recognition and user intention based on sensors and processing power directly integrated into the hand as the main contribution of our work. The control scheme uses the sensor system integrated into the prosthesis to extract relevant object information, select an appropriate grasp and recognize the user's intention. Except for two EMG electrodes, no additional sensors are attached to the human body or mounted in the environment. This allows to execute grasping tasks including wrist rotation, finger and thumb closing in a continuous manner based on only one user input signal to trigger the grasp execution process. Feedback to the user regarding automatically selected grasps is provided via a color display embedded into the dorsal side of the prosthesis. The grasp trajectories are generated from human grasping demonstrations on known objects of daily life. By transferring autonomous robot grasping functionalities to prosthetic hands, we aim to simplify the grasp process for the user and reduce their workload throughout the control of their hand. In contrast to the state-of-the-art, the presented control scheme relies solely sensors and computing power integrated into the prosthesis. This makes the system applicable in daily life without the need for external sensors attached to the user's body or the environment. Control of the prosthesis is facilitated by a single user input signal and an IMU. The proposed control scheme is evaluated in a real world experiment regarding grasp time and success as well as cognitive burden for the user. The semi-autonomous control scheme is compared to a traditional, fully user controlled approach as well as a hybrid control scheme, that provides autonomous grasp suggestions, while the hand closing is controlled manually by the user. Both the hybrid and semi-autonomous control scheme are developed and introduced in this work to assess the merit of different levels of autonomy in prosthetic grasp control.
We consider the presented semi-autonomous control integrating hand state estimation and user intention based on on-board sensors and processing power as the main contribution of our work.
In Section 2 we briefly describe the used prosthetic hand with its embedded sensor system, processing capabilities and resource-aware image processing. Section 3 explains the human grasp database that is directly learned from human motion recordings. The proposed semi-autonomous control is then detailed in Section 4. Section 5 describes the experimental setup used to evaluate the proposed control and Section 6 presents the results of these experiments. The paper concludes with a discussion of the semi-autonomous control and its experimental evaluation.

The Prosthetic Hand and Its Grasping Abilities
The semi-autonomous control scheme presented in this work is implemented on the female KIT Prosthetic Hand which is based on the prosthesis presented in [29]. In the following we describe the prosthesis mechanics and processing system, as well as its sensor setup for completeness. In addition, the intelligent functionalities provided by the prosthesis are presented. These include an object recognition with a camera in the palm of the prosthesis based on previous work [26], as well as a new database of human grasp motions and its transfer to the prosthetic hand.

The Mechatronics
For the implementation and evaluation of the semi-autonomous control scheme, our female version of the prosthesis is used [29]. The system components of the prosthetic hand and shaft are shown in Fig. 1. The hand is driven by two motors actuating the thumb and the four fingers, respectively as shown in Fig. 2. A cascaded P-controller for position and velocity control is used to drive the motors. The hand has two degrees of actuation (DoA) and ten degrees of freedom (DoF) with two joints for flexion and extension in each finger. The fingers are connected to the motor by a force distributing mechanism based on the TUAT/Karlsruhe mechanism that allows the fingers to wrap around arbitrarily shaped objects as described in [30] and [31]. The prosthetic hand has been developed to be personalizable in size and grasping abilities. It is equipped with a multi-modal sensor system that allows the realization of intelligent grasping behaviors that are easy-to-use and tailored to the user's needs. The prosthesis is sized according to the 50 th percentile of female hands conforming to the German Standard Specification (DIN 33402-2) and has a weight of 377 g. In a cylinder grasp the prosthesis provides a grasp force of 24.2 N. Besides relative encoders on both motors (IEH2-512, Faulhaber), an RGB camera and an IMU sensor (BNO055, Bosch Sensortec), the prosthesis also comprises a distance sensor (VL53L1X, ST) embedded into the palm. The camera and distance sensor are shown in gold in Fig. 2 Figure 1: Underactuated hand with two motors and ten degrees of freedom (DoF); a sensor system consisting of camera, distance sensor and IMU; a color display and an embedded system for sensor data processing and control. In addition, the wrist rotation unit and user interface, which are integrated in a self-experience shaft are shown.
The developed algorithms for sensor data processing and control are running on an on-board embedded system with a 400 MHz ARM Microcontroller integrated into the prosthesis palm. This integration allows using the prosthesis in standalone mode without the need for any external computing power, sensors or internet connection.
To allow the inclusion of able-bodied subjects into the experimental evaluation of the semi-autonomous control strategy developed in this work, a selfexperience shaft was designed. It is used to attach the prosthesis below the arm at the palmar side of the human hand as depicted in Fig. 2 and Fig. 3. This setup allows the execution of grasping actions by able-bodied subjects under conditions comparable to amputated users. The self-experience shaft is connected to the prosthesis by a quick release fastener. The wrist is actuated by a motor providing a pronation motion of 90°and a supination motion of 180°. Thereby it is spanning the human range of motion of forearm pronation and supination combined with passive shoulder rotation [32] and [33]. The wrist rotation is also directly controlled by the on-board embedded system of the prosthesis and similar to the hand motors a P-controller is used for wrist motor control. The shaft further contains the battery, powering the hand-wrist system as well as Figure 2: Technical rendering of the prosthetic system with the hand and self-experience shaft; the motors actuating the fingers (blue), thumb (red) and wrist (green) as well as the distance sensor and camera (gold) are shown; the IMU is mounted on the PCB in the back of the hand. the two EMG electrodes (13E200, ottobock), which are used to measure the excitation of wrist flexor and extensor muscles. The EMG electrodes provide internal filtering of parasitic signals. They operate in a frequency bandwidth of 90 Hz to 450 Hz. The conditions of the EMG detection of muscular signals is kept constant over all implemented control schemes.

Visual Object Recognition for Prosthetic Hands
To endow prosthetic hands with the ability to autonomously perform parts of grasping tasks, we transfer a vision-based approach to grasping from robotics to the prosthetic application. The robotic problem to select a suitable grasp on an object of interest in an autonomous way can help to reduce the workload of the user in prosthetics. This is our motivation behind integrating a camera in the prosthesis, i. e. to transfer our robot grasping knowledge to prosthetics. Given an object of interest that can be recognized with computer vision methods and an object database of daily objects, the prosthesis should be able to autonomously determine grasps and select the most appropriate one. The execution of the chosen grasp is then triggered by the user. A fundamental requirement to successfully recognize objects, plan and select grasps is that all computations should be performed in real-time on the in-hand integrated embedded system.
To achieve this ability, we use a resource-aware visual recognition system, which is based on a convolutional neural network (CNN) running on the in-hand embedded system. It achieves a recognition accuracy of 96.5 % on 13 pre-trained objects from a household environment. The recognition accuracy of the objects used in our evaluation is depicted in Fig. 4. This visual object recognition system is described in detail in [26]. Here, we give a very brief overview for completeness as the recognition of objects in the scene is key and the first step of the semi-autonomous control scheme. Within the presented controller, the object recognition is triggered by the user via a single EMG signal. A camera image is captured and processed by the recognition system to identify the object in the field of view of the prosthesis. Image processing and object detection are performed in 115 ms. The CNN outputs the recognition probability of all 13 objects and the object with the highest recognition probability is chosen. The number of 13 recognized objects provides a viable tradeoff between recognition rate, memory consumption and processing time based on the utilized microcontroller.
The focus of the visual object recognition as well as our semi-autonomous control scheme is set on free-standing single objects. While the CNN is capable of recognizing objects in front of varying multicolored backgrounds to some extent, the grasping of objects in cluttered environments is out of the scope of our work.

Generating Grasps from Human Demonstrations
In prosthetics, grasps must be stable, predictable and optically unobtrusive. Humans achieve these goals intuitively in their everyday grasping activities.
Human-like prosthesis grasps should align with human expectations of hand behavior and therefore enhance the predictability of a prosthetic hand. Hence, the trajectories of our semi-autonomous grasp control are learned from human demonstration. To this end, we created a grasp database with predefined human grasps on 29 objects from a household and workshop environment for the top and side grasps.

Data Acquisition
To learn from a broad range of human grasp examples, a kinematic study of 510 grasping motions was conducted, performed by 5 healthy subjects (two female and three male) on 29 objects. Objects used in the study are chosen from the KIT Object Database [34] 1 and the YCB Object Set [35] 2 . The objects are chosen to represent several primitive shapes as cylinders, boxes and spheres, but also include more complex shapes like pitchers, a banana and a bowl. While only a subset of 13 objects is used for the semi-autonomous control scheme, we record a larger set of objects in different sizes, weights and shapes to establish a comprising grasp database. In a wider perspective this allows us to derive human grasps for a wide range of objects and thereby enables the personalization of the subset of grasps provided by a semi-autonomous control to the needs of each specific user.
Throughout the grasp recordings, subjects wear a sensorized data glove (Cy-berGlove III, CyberGloveSystems Inc.) measuring 22 joint angles of the human wrist, palm and fingers and an IMU mounted on the back of the hand recording the hand orientation. The data glove is calibrated by a procedure adapted from Gracia-Ibáñez et al. [36]. Reference postures at defined finger joint angles are taken by pressing against reference blocks. Calibrated finger joint angles are calculated assuming a linear correlation of the sensor readings. Isolated thumb motions involving individual DOFs are measured additionally to identify the cross-correlations between flexion, abduction and circumvention. For the IMU calibration, the hand is positioned upright on the table and a reference sample is used.

Human Grasp Recordings
The participants performed the grasping procedure with their right hand. While one subject was left-handed, their motion data showed no significant difference compared to the four right-handed subjects. Participants were on average 24 years old and the mean hand length from the wrist to the tip of the middle finger was 18.5 cm. The study was carried out in accordance with the recommendations of the ethical committee of the Karlsruhe Institute of Technology. The protocol was approved by this ethical committee. All subjects gave written informed consent.
Throughout a recording session, the subject is seated comfortably in front of a table. The medial side of the hand is placed on the table and the thumb is abducted and opposed to the fingers. Subjects are asked to move their hand to the object positioned 29 cm left of the hand, grasp and lift it naturally. Except for spheres and flat objects with a height below 40 mm, all objects are grasped with two different approach directions resulting in a top and a side grasp. To align with the functionality of the prosthesis, the subjects are asked to perform opposition grasps, allowing both power and precision postures. All grasps are executed twice per subject. The recorded data is available in the KIT Whole Body Human Motion Database [37] 3 .

Human Data Mapping
To transfer the human finger motions to the prosthetic hand, an endpointbased mapping approach is applied similar to the approaches in [38] and [39]. The human fingertip trajectory is calculated from the joint angle measurements while the trajectories of the prosthetic fingers are extracted from video data of the hand closing at constant speed. While the thumb motion can be directly transferred to the prosthetic thumb, the four fingers of the prosthesis are driven by a single motor, which does not make a direct mapping possible. Thus, we develop a method to address the transfer of demonstrated grasps to the prosthetic hand. For all grasp types used in our study, the middle finger is included and mostly centred in the second virtual finger opposing the thumb [40], [41]. Therefore, the middle finger trajectory is chosen as representative for the human finger motion.
The entire procedure of grasp transfer onto the prosthesis given an object and a grasp orientation (top/side) is listed in Alg. 1. The calculations are exemplarily shown for the fingers, the implementation for thumb and wrist is similar. The algorithm has two parameters as input. The first parameter, H obj := (p i (t), l i ), is a set of tuples where p i (t) is the trajectory of the middle finger of subject i for object obj. The parameter t indicates that p i (t) is a time-dependent data series. The second element in the tuple l i is the length of the subject's middle finger. This is needed for the normalisation of the trajectory over the different hand dimensions of the subjects. The second parameter p prosthesis (t) is the measured trajectory of the middle finger of the prosthetic hand which is needed for the mapping.
The human fingertip trajectories are normalised to the length of the prosthetic fingers (Alg. 1, line 4). All motions on the same object and direction are additionally normalised regarding the execution time (Alg. 1, lines 5-8). The mean trajectory, averaged over all human demonstrations (Alg. 1, line 10), is mapped to the prosthesis trajectory by a nearest neighbour correlation of all trajectory points (Alg. 1, line 12). The wrist rotation is directly transferred from the human grasps to the prosthetic device.
The grasp trajectories are discretized in steps of 100 ms and are executed on the prosthetic hand with 10 Hz accordingly. The execution of the low-level motor control causes an overall delay of 295 ms in the grasp execution. Taking into account the overall execution time of a grasp being roughly 10 s, the response time of the low-level motor control and the delay resulting from this is therefore negligible.
The motor trajectories as well as the corresponding fingertip trajectories of the prosthesis for a grasp on a pitcher are shown in Fig. 5. The important characteristics of the grasps in the database is their continuous representation 11: for all τ do 12: p mapped (τ ) := nearest neighbour(p mean (τ ), p prosthesis (t)) 13: return p mapped (t) as they are not defined by a fixed wrist orientation, static preshaping aperture and grasp pose, but instead, all degrees of freedom are controlled by continuous trajectories describing the entire motion throughout both preshaping and grasp acquisition. In contrast to a fixed hand closing with predefined preshape aperture, these continuous trajectories allow for different timing and closing order as well as interactions of the fingers and the thumb with varying, synchronized closing velocities. The third degree of freedom, namely the wrist orientation, is also described by a trajectory executed simultaneously to the finger and thumb closing motions. While the global reorientation of the hand according to the grasp orientation is performed early in the preshaping phase, this wrist motion trajectory over all grasp phases enables further adjustment in orientation to ease the final grasp acquisition.

The Semi-Autonomous Grasping Controller
Based on the visual object recognition and the human grasp database, we present a semi-autonomous control scheme for prosthetic hands. The control scheme automates part of the grasping process to reduce the cognitive burden of the user. Simultaneously, the user can influence or stop the grasping process at any time to keep in control of their prosthetic hand. The control flow of the semiautonomous control scheme, including the usage of sensor information, object and grasp databases and user commands is depicted in Fig. 6. An architectural diagram of the semi-autonomous control scheme is also depicted in Fig. 7 and the finite state machine implementing the control scheme is shown in Fig. 8 Figure 6: Steps of the semi-autonomous controller, beginning with the first step on the left. User input is explicitly provided through an EMG signal and an arm rotation in the first two steps. Prior object knowledge in the object database is used for visual object recognition. Prior grasping knowledge in the grasp database is used for intention recognition and grasp selection. In the last two steps the grasp trajectory is performed on the prosthesis. User intervention is possible at any time.  is shown in Fig. 9 and Fig. 10. Video 1 shows and explains the procedure of the semi-autonomous control 4 . The user triggers actions of the prosthesis via muscle activations measured by a single EMG-channel. Status information is presented to the user on the display at the back of the hand. Once the object to be grasped is identified based on visual information and object knowledge in the object database, the user's intention to grasp the object of interest is recognized and an appropriate grasp from the grasp database is selected. The recognized object and selected grasp type (top or side grasp) are suggested to the user on the hand display. Both the object and the selected grasp can be changed by the user. The hand and wrist motion is triggered by the user via an EMG signal to bring the hand in a suitable preshape for the selected object and grasp. The wrist orientation with respect to the object is actively maintained based on IMU sensor data to compensate for unwanted orientation changes due to the reaching motion. Once the prosthesis is close enough to the object, it automatically closes the fingers based on the distance sensor information to firmly grasp the object.

User Intention Recognition and Grasp Selection
To start the grasping process, the user takes an image of the desired object by a single muscle activation measured with the EMG electrodes, as shown in the leftmost image of Fig. 6 and at the first dashed line in Fig. 9 and Fig. 10. The in-hand object recognition is run on this image which is recorded by the camera in the palm of the hand. Using the object information provided by the object recognition module, the object database is queried to retrieve detailed information about the given object including object properties and associated grasps. For each object, the following object properties are stored in the database: the three object dimensions, the weight of the object and its fragility. Grasps associated with the objects are stored in the human grasp database (see Section 3). Here, top and side grasps are associated with most objects except flat objects and spheres that only permit a top grasp.
Once the object is identified, the user is informed about the result of the recognition by showing the object name on the display. Based on the relation of the hand to the object, which is estimated based on IMU data, a top or side grasp is automatically proposed by the hand controller. The IMU measurements for grasp orientation suggestion are updated with 100 Hz. These grasps are continuously updated by the user by rotating the prosthesis. In the current Video 1: Video demonstrating the different aspects of the semi-autonomous control scheme implementation, a top grasp is selected if the prosthesis is held horizontally, and a side grasp is selected if the prosthesis is held at an angle of more than ±15°. The proposed grasp and orientation are shown in different colors on the display to ease the selection process for the user, as shown in Fig. 6. The user intention, i. e. the target object to be grasped and the way to grasp it (top or side grasp), together with the object properties, is used to select, parametrize and execute the grasp using the corresponding trajectories from the human grasp database.
It is important to emphasize that the user is able to interact with the hand during the entire process by confirming or rejecting alternatives generated by the control scheme. If the user is satisfied by the proposed grasp, she/he can confirm and trigger the execution using one single EMG channel as marked by the second dashed line in Fig. 9 and the third dashed line in Fig. 10. The same EMG channel can be used to trigger the object recognition and to start the grasp execution. Otherwise, the user is able to change the grasp direction by re-positioning their arm relative to the object. In case of a wrong object classification, the user can reject the proposed grasp by shaking the hand as marked by the second dashed line in Fig. 10. Such movement is recognized using the IMU. The control scheme always selects the object with the next highest recognition probability. If the first three proposed grasps are rejected by the user, the controller can be restarted by requesting a new camera image for the object recognition.

Preshape Motion and Grasp Execution
Once a grasp is confirmed by the user, both hand and wrist pregrasp trajectories are selected from the human grasp database and executed as shown in Fig. 8. The pregrasp trajectory is executed while approaching the object to ensure feasible hand orientation and finger aperture. The hand preshape motion and the continuous wrist orientation are performed simultaneously. At the end of the pregrasp trajectory, the wrist motion is nearly finished as can be seen Figure 9: Sensory data throughout one grasp execution with the semi-autonomous control; the two user inputs can be clearly seen in the EMG signal, pregrasp and grasp motion can be distinguished in the finger closure, the shaft rotation reflects the angle of the wrist throughout grasp execution and the final grasp is triggered by a low measured distance to the object at the third dashed line in Fig. 9 and the fourth dashed line in Fig. 10. The pregrasp and grasp poses are pictured in Fig. 6 on the right.
Once the pregrasp motion is finished, the wrist is controlled to maintain the preshape orientation relative to the gravity vector using IMU sensor data, compensating rotations caused by the user's arm movements. Thereby, a correct hand orientation is ensured regardless of arm reconfiguration which might be required to reach the object, adjust grasping distance or avoid obstacles. In this way, compensatory motions of the shoulder should be prevented as the user does not have to take the influence of their approach movement into account.
With the distance sensor in the palm of the prosthesis, the distance to the object is continuously measured. As soon as the distance between prosthesis and object falls below a predefined threshold and the prosthesis has reached the final posture of the pregrasp, the grasp motion is triggered and the grasp trajectory is executed. This is marked by the third dashed line in Fig. 9 and the fourth dashed line in Fig. 10. Finally, a closing force is applied. The amount of this force depends on the fragility and weight defined by the object's properties stored for each object in the object database. Once the final grasp is completed, Figure 10: Sensory data of a grasp execution with initially wrong object recognition; a clear shaking of the hand can be seen in the wrist angle measurement issued by the user to alter the object suggestion before starting the grasp the object can be lifted.
At any time the grasping process can be stopped and aborted by a shaking movement of the prosthesis detected by the IMU as described in Section 4.1. The semi-autonomous control scheme focuses on the acquisition of a stable grasp. After the grasp is completed, the user can lift, use the object as needed and release the object when such action is triggered by another muscle activation signal measured by the EMG electrodes.

Experiment Design
To assess the functionality, intuitiveness and complexity of the proposed semi-autonomous control, a user study is performed comparing it to a conventional sequential control approach. A third control strategy with reduced autonomous functionality is additionally included to assess the influence of increasing autonomy of the hand on user experience and find the optimal tradeoff between supporting functionality and user control. Hence, we compare three control strategies, which are all operated by the user via a standard two channel EMG input. The two EMG channels are only used for a conventional sequential EMG control as a baseline. The other two semi-autonomous control schemes only need a single EMG input channel and signals from both electrodes are therefore accepted equitably.
Conventional Sequential Control (CSC) This sequential control approach allows either the wrist rotation or the opening and closing of thumb and fingers simultaneously with a fixed coordination. The two available electrode signals are thereby mapped to the two rotation directions or the opening and closing of the hand respectively. To switch between wrist rotation and hand control, both EMG electrodes have to be addressed simultaneously by a co-contraction of both muscles. This control approach is common in commercial hand prosthetics (see [42,43,44,45]) and represents the baseline for the comparison of our method.

Semi-Autonomous Control (SAC)
The semi-autonomous control applies our approach described in Section 4, including object recognition based on the visual information, predefined grasp trajectories learned from human demonstrations and automatic hand closing based on a distance sensor located at the base of the thumb. All user commands, namely the start of the object recognition and the confirmation of a grasp proposed by the control scheme of the hand, can be generated by contracting either one or both of the muscles to which EMG electrodes are attached. Therefore the user can issue control commands with the EMG signals that are easiest to generate for them. Aborting the current action is always possible by a fast and short shake of the prosthesis.

Semi-Autonomous Preshape (SAP)
Since the final hand closing is crucial for grasp success, this third control strategy allows an individual timing of the hand closing motion by the user. The preshape of the hand and the preparing wrist orientation are executed similar to the SAC strategy. The maximum grasping force in this mode is also set as in SAC based on the information in the object database and human grasp database. However, hand closing is not triggered automatically based on the hand-object distance, but instead actively controlled by the user. While the first two control inputs similar to the SAC strategy can be triggered by any muscle activation, hand closing is controlled by contracting the flexor muscles as in the CSC strategy. During this process, the finger and thumb trajectories are still derived from the human demonstrations and are therefore adapted according to the chosen grasp. As long as the user sends an EMG signal, the hand closes along the trajectory from the human grasp database. If the EMG signal is paused, the execution of the trajectory is paused as well, until a EMG signal is received again and the trajectory is continued.

Setup and Procedure
The user study is performed with 20 able-bodied subjects wearing the prosthesis connected via the self-experience shaft on their right arm as depicted in Fig. 3. From the nine female and eleven male subjects, ten had a background in robotics, five had no technical background. None of the subjects had experience with hand prosthetics or EMG control. The study was carried out in accordance with the recommendations of the ethical committee of the Karlsruhe Institute of Technology. The protocol was approved by this ethical committee and all subjects gave written informed consent.
The EMG electrodes are positioned for each subject individually and the electrode sensitivity is adjusted to maximize the signal quality. Electrode configurations are then kept fixed over the entire study session. During the experiment, the subject is positioned in a comfortable standing position in front of a table. A subset of the objects contained in the human grasp database is used for this user study. Ten different objects, chosen from a household environment, are successively placed on the table in front of the subject. The objects are depicted in Fig. 11. All three control strategies are evaluated consecutively in randomized order. Each control strategy is explained to the subjects by the experimenters. Subjects are given one minute prior to the evaluation to familiarize with the control and practice with an eleventh object not included in the evaluation. To begin each grasp, the prosthesis is positioned 13 cm to the front right of the object. An example for this experimental setup is depicted in Fig. 3. For each control strategy the subject is asked to grasp all objects from the top first, then from the side if the object allows a side grasp, resulting in 16 grasps in total. If a grasp fails in the first grasp attempt, it can be repeated once. If a grasp fails again in the second attempt, the experiment is continued with the next grasp. The failed grasp is then excluded from the quantitative measurement of grasp time and muscle activation, but is still taken into account by the subjects in the evaluation of control perception and workload. Each subject performs all three control strategies. The study is conducted with a counterbalanced crossover design of the control strategies. This means that the order of control strategies in the experiments is randomized with a similar number of participants starting with each control strategy. Additionally, the order of objects is randomized in between subjects but kept constant for all three control strategies in one subject.

Data Acquisition
To assess the performance of the semi-autonomous control scheme, several metrics are acquired in the user study. The grasp execution time is applied as metric for the grasp efficiency. Therefore, the time starting at the beginning of the grasp until lifting the object is recorded. As the quality of the object recognition is not a central part of the presented semi-autonomous control, the time required to discard wrong object recognitions is assessed individually. To quantify the required amount of physical effort, the EMG activation signal over the duration of the grasping process is recorded as a quantitative metric. To assess complexity, success and user impression of each control strategy, a subjective questionnaire is collected. It provides the workload as measured by the NASA task load index (NASA TLX) [46]. In our evaluation we aim to compare the workload of the different control schemes in each subject. Therefore, we apply the metric of the raw TLX and directly calculate the unweighted average of the sub-scale ratings provided by the subjects. Compared to the individual weighting of sub-scales this method has been found to be more sensitive [47]. The questionnaire is extended by several questions to quantify intuitiveness of the control, feeling of control and perception of feedback in the same style as the questions of the workload index. Furthermore, open questions on the subject's impression and preferences are asked.

Results
The proportion of users preferring each control strategy is depicted in Fig. 12. Of all participants, 65.2 % preferred the SAC control compared to the two other strategies. The results of the evaluating questionnaire and the recorded EMG signals are depicted in Fig. 13. The reported preference of the SAC control strategy is also visible in the control intuitiveness as shown in Fig. 13 a). All   Figure 13: Outcomes of the user study: (a) intuitiveness of the control reported by the subjects, (b) workload according to the NASA Task Load Index [46], (c) effort put into the grasp execution, (d) physical demand of the control strategy, (e) mean muscle contraction signal over the entire recording and both EMG electrodes and (f) feeling of control reported by the subjects; all graphs show the data points together with the kernel density function, the median is marked by a white dot and the grey line marks the section between the first and third quartile plots show the shape of the kernel density function around the data points. The median is marked as a white dot, while the grey line denotes the range between the 25 th and 75 th percentile of the data. The colored points denote the answers/measurements of individual subjects. The horizontal distribution of data points is merely for visualization purposes. Throughout the trials of each control scheme, no significant learning was observed from the subjects.

Workload and Control Intuitiveness
The workload index of both SAP and SAC is significantly lower than for CSC (Friedman's Anova < 0.05), as depicted in Fig. 13 b). The Friedman's Anova is a non-parametric statistical test to measure the differences between two groups. The NASA Task Load Index [46] ranges between 1 and 20 with higher numbers representing an increasing overall task load. With a median of 11.3 the workload index of CSC is almost twice as high as for SAC with 6.2 and almost one third higher than for SAP with 8.6. In the following all results except the NASA TLX from the subjective questionnaire are converted from the scale between 0 and 20 to percent.
The high workload index of CSC is mainly caused by a high physical demand of 85 % in median and a high required effort of 75 %. A significant reduction (Friedman's Anova < 0.05) is achieved with the SAC for the median of both the physical demand to 25 % and the effort to 40 %. Also for SAP the physical demand is notably decreased to 47.5 % in the median compared to the common baseline of CSC. The amount of required effort and physical demand is visualized in Fig. 13 c) and d).
The observed physical demand is clearly reflected in the use of EMG control signals. The EMG electrodes supply a filtered output voltage correlated to the muscle activation signal. Fig. 13 e) shows the average EMG activation calculated by integrating the EMG voltage of both electrodes over the grasp trial and normalizing it according to the grasp execution time. While grasping with CSC requires a median electrode activation of 203.7 mV, in SAC only 69.4 mV is recorded. This clearly shows the lower muscle contraction due to the introduced autonomous functionality. In CSC, an EMG electrode activation is recognized three times more frequently than in SAC, proving that the reduction of muscle contraction is mainly caused by reducing the number and length of necessary user inputs. As the reported intuitiveness shows, this input reduction can be achieved without a loss of trust into the device. Besides, subjects did not report any statistically significant difference in their feeling of control between CSC with a median of 60 % and SAC with 62.5 % as shown in Fig. 13 f). As expected, SAP has a higher average electrode activation than SAC. Nevertheless SAP still results in a significantly lower median muscle activity of 111.2 mV compared to CSC with 203.7 mV.
For CSC, ten subjects stated that the switching of control modes between hand and wrist motion was tedious, hinting at the co-contration and mode switching as one of the major sources causing the high workload. One subject stated that the grasping in SAC did not require attention and three subjects described hand closing in SAP to be very intuitive. The mean intuitiveness for b a n a n a SAC increased by 30 % in the median compared to CSC. As shown in Fig. 13 a), all three control strategies have an intuitiveness median of more than 50 %, with SAC being most intuitive with a median of 85 %. In addition, a quarter of participants reported the SAC to be very intuitive when asked to describe their impression of the presented control in their own words in the questionnaire.

Grasp Execution Time and Grasp Success
The grasp execution time was measured as the time needed to reach the object, grasp and lift it off the table surface. The median execution time over all subjects and objects is 9.8 s for CSC, 9.7 s for SAP and 8.4 s for SAC. The grasp execution time depends strongly on the quality of the object recognition. Over all grasps performed in SAC and SAP, subjects were on average 41.6% faster, if the object was classified correctly in the first attempt. A correct intention recognition and hence a correct suggestion of the grasp direction sped up the grasping time by 7.0%. For the remainder of the evaluation, the time spent on wrong classifications by the object recognition will not be considered as the quality of the object recognition is not a central part of this work. Excluding this leads to a reduction of the median execution time to 7.9 s for SAC and 9.2 s for SAP. While the time required for grasping is very long compared to humans grasping with their able hand, it is still fast compared to the commercially used conventional sequential control scheme CSC. Considering that the naive subjects had only one minute of training prior to the experiments, the grasp time of 9.8 s measured for CSC is well within the range observed in literature with the same control scheme [20].
A significant difference in the overall time required for a grasp is only notable between SAC and the other two control strategies SAP and CSC, respectively. The grasp execution time in SAC is 19.4 % faster compared to CSC. A quarter of the subjects specifically mentioned the SAC to be perceived as very fast. This was mainly ascribed to the automatic hand closing which was perceived as very helpful. The average grasp execution time for all individual grasps is depicted in Fig. 14 showing the median with a solid line and a box from the first to the third third quartile of measured grasping times. Overall, it can be seen that CSC has a notably larger variance than SAP and SAC. Large and bulky objects like the football, the bowl or the canned meat are grasped from the top at a similar speed with all three control strategies. The merit of the autonomous coordination of all degrees of freedom of the hand becomes mainly apparent in objects which need a precise grasping strategy like the top grasps on the fizzies and chips. This is also evident for grasps that demand a large wrist rotation compared to the starting pose like the side grasps on the fizzies and the canned meat.
Furthermore, the time required for object detection, intention recognition and the control interaction with the user in SAC and SAP is assessed individually. The object detection time is measured from the EMG activation command issued by the user, until the correct object is recognized and presented on the display. It therefore includes the time for potential misclassification. The object detection required an average time of 1.6 s. The time for intention recognition is measured from the moment of correct object detection until the correct grasp direction is suggested, including the time needed to rotate the hand, if the grasp direction is inferred incorrectly. The intention recognition took on average 0.7 s. From the moment, object and grasp direction are presented correctly until the grasping start issued by a user EMG command, the interaction time is measured. This includes the time the user needs to read and check the grasp suggestion before confirming it. The interaction time amounts to 1.3 s on average.
Taking into account the time needed for wrong object recognition, the grasp execution time depends strongly on the quality of the object recognition. Over all grasps performed in SAC and SAP, subjects were on average 41.6 % faster, if the object was classified correctly in the first attempt. A correct intention recognition and hence a correct suggestion of the grasp direction sped up the grasping time by 7.0 %. Looking at the grasp success for the 16 different grasps reveals that subjects were overall comparably effective in grasping objects with SAC and SAP. The attempts needed to successfully perform the different grasps with each control scheme are shown in Fig. 15. In total four of the 16 grasps in the conducted experiment could be executed successfully in the first trial by all participants in CSC while there were five grasps without failure in SAC and eight in SAP. Although SAP proves to be the most effective control strategy on this basis, participants preferred SAC. In addition, half of them commented on SAC being easy to control. A reason for this discrepancy might be found in the difficulties of Grasp attempts CSC SAP SAC Figure 15: Attempts needed to achieve a successful grasp on all ten objects and the overall number of grasp attempts in the three evaluated control strategies SAP for specific cases, especially the top grasp on a package of fizzy tablets which can be clearly seen in Fig. 15. As these have a small diameter, an accurate hand positioning is important. Keeping the exact hand position while performing a muscle contraction to close the hand was difficult for many subjects. On this specific grasp subjects needed on average 2.1 grasp attempts to successfully lift the object. Due to this reason, object slip occurred more often in SAP than in both SAC and CSC. No object was knocked over in SAC, while this happened once in SAP and twice in CSC. Additionally, in CSC subjects were frequently struggling with undesired wrist rotation during grasping, which in one case caused the grasp attempt to fail entirely. This is directly prevented by the semi-autonomous control schemes, since the tedious and unreliable switching between wrist rotation and hand closing is not required. Finally, the grasp force control of SAC fully prevents grasp failures due to insufficient grasp force, which occurred five times in CSC and six times in SAP. In addition we observed that some subjects crushed the fragile bandaid package in CSC by applying too much grasp force, which was prevented by the control in SAP and SAC.
The quality of neither the object recognition nor the intention recognition had a significant influence on grasp success. The difference in grasp success rate over all grasps in SAC and SAP was 0.2 % with a successful object recognition compared to cases where several object suggestions were needed. Comparing an instantly correct intention recognition with cases where the user had to correct the suggested grasp orientation by slightly rotating the prosthetic hand, grasp success varies by 0.4 %.

Discussion and Conclusion
In this work we propose a semi-autonomous control scheme that automatically chooses and executes a grasp trajectory and wrist orientation based on visual object recognition. With a single EMG channel, a starting command invokes a CNN for object recognition on an image from the camera in the palm of the hand. The object identity is then presented to the user together with the approach direction from the top or side. The approach direction can be changed via the IMU by slightly tilting the forearm. If the user is satisfied with the suggested object and approach direction on the display, a single EMG command starts the execution of a coordinated trajectory of fingers, thumb and wrist to form a preshape. The wrist orientation is continuously adapted relative to the gravity vector to compensate for orientation changes during the reaching motion. As soon as the hand reaches the object, the grasp is triggered by a signal from the distance sensor. The hand closes with a predefined maximum grasp force dependent on the object. All necessary sensors are embedded into the prosthesis and the control scheme is running on the embedded system inside the palm, eliminating the need for external sensors and devices. Grasp trajectories for the objects are learned from human demonstration. The whole control scheme can be operated using a single EMG channel and motion input sensed by the IMU. Based on sensor information directly acquired on the prosthetic hand, context and user intention are deduced and exploited to propose suitable grasps to the user. With a single EMG channel, the user is able to start the semi-autonomous grasping process and choose the desired trajectory. Grasp trajectories and object properties from an object database are selected by an image-based object recognition. The approach direction is deduced from the user's forearm orientation measured by an IMU within the prosthetic hand. Once the user has started the grasping motion via an EMG command, a preshape is performed resulting in an appropriate hand orientation and finger aperture to approach the object. The final grasp is triggered based on a distance sensor as soon as it has reached the object.
Compared to a conventional, sequential EMG control our semi-autonomous control requires less than half the amount in average EMG activation and the physical demand is rated 70.6 % lower in the median. Together with an increase of the intuitiveness by 30 %, this causes a significant reduction of the workload by 25.9 %. As a consequence, the prosthesis user has to concentrate less on the performance of a stable grasp. In addition, this reduced workload allows for faster grasping especially for thin and delicate objects. The naïve subjects achieved a median grasping time of 7.9 s with the semi autonomous control. This lies well within the range of semi-autonomous control schemes presented in literature [16,20] and is notably faster than the baseline conventional sequential control both in our evaluation as well as in literature [20]. At the same time, the feeling of control is comparable to the conventional sequential control as the user is able to intervene at any moment.
Due to the required object detection, the presented control is limited to known objects and is currently meant for frequently used objects. In a setup phase, the user could take images of frequently used objects with the prosthesis, which then can be used to train the object detection. The training could for example be accomplished by uploading the images to a smartphone or PC via Bluetooth where the object detection is trained and its result written back to the prosthesis. After the object detection is adapted, the user would then be able to use the proposed control with her/his personalized set of objects. This means that the semi-autonomous control can be personalized to the specific objects a user frequently grasps with her/his prosthetic hand. It thereby complements the common, manual control to reduce the cognitive burden on the user in situations which are encountered repetitively in daily life. While the amount of objects in this work is fixed to 13 due to the limitations of the used microcontroller, the use of FPGAs can greatly increase memory and computing power for vision applications [48,49,50].
In comparison to the related work presented in Section 1, the strength of our control scheme is that it relies only on on-board components and does not require any external sensors or computation resources. To our best knowledge, it is the first semi-autonomous control that operates entirely on the prosthetic hand. Several previous works choose a grasp preshape based on the object's overall shape and are therefore able to give grasp suggestions also for unknown objects with the use of external sensors and computing resources [16,18,20]. Others present sophisticated object classification for a significantly larger set of objects, again making use of external computation power [24,25]. However, extensive sensor setups and external computing resources restrict the flexibility in using prosthetic hand in everyday activities. Our semi-autonomous control system is therefore developed to overcome such limitations and pave the way towards the next generation of prosthetic hands that integrate the sensors and computing power to facilitate a symbiotic interaction with the user. Furthermore, our approach does not only provide an automatic grasp preshape, as usually proposed in semi-autonomous control. It additionally provides hand closing trajectories, so that the user does not need to worry about the timing and velocity of finger and thumb closing. To the best of our knowledge, our semi-autonomous control is also the first scheme that allows simultaneous wrist orientation and hand closing. These simultaneous motions are beneficial to increase overall grasp speed and to adapt the hand orientation to further optimize the grasp acquisition especially for thin objects.
In the future we plan to further analyze the workload distribution in our semi-autonomous control by conducting a psychological study. Thereby we aim to get a fine-granular picture of the workload distribution over the entire grasping task and to identify parts of the control that benefit most from further improvement. In addition we plan to extend our work by the inclusion of additional haptic sensor modalities to allow for closed-loop grasp force control. In this case the grasping force saved in the object database would serve as an initial control target that is then updated based on normal and shear forces as well as slip detection. This would enable the prosthesis to react to changes in the object, for example while pouring liquid out of a grasped bottle, which is currently not modeled by the static grasping force saved in the object database. The in-tegration of an additional object pose estimation based on the camera images would make it possible to dynamically adapt the grasp to tilted objects.