Real-Time sEMG Pattern Recognition of Multiple-Mode Movements for Artiﬁcial Limbs Based on CNN-RNN Algorithm

: Currently, sEMG-based pattern recognition is a crucial and promising control method for prosthetic limbs. A 1D convolutional recurrent neural network classiﬁcation model for recognizing online ﬁnger and wrist movements in real time was proposed to address the issue that the classiﬁcation recognition rate and time delay cannot be considered simultaneously. This model could effectively combine the advantages of the convolutional neural network and recurrent neural network. Ofﬂine experiments were used to verify the recognition performance of 20 movements, and a comparative analysis was conducted with CNN and LSTM classiﬁcation models. Online experiments via the self-developed sEMG signal pattern recognition system were established to examine real-time recognition performance and time delay. Experiment results demonstrated that the average recognition accuracy of the 1D-CNN-RNN classiﬁcation model achieved 98.96% in ofﬂine recognition, which is signiﬁcantly higher than that of the CNN and LSTM (85.43% and 96.88%, respectively, p < 0.01). In the online experiments, the average accuracy of the real-time recognition of the 1D-CNN-RNN reaches 91% ± 5%, and the average delay reaches 153 ms. The proposed 1D-CNN-RNN classiﬁcation model illustrates higher performances in real-time recognition accuracy and shorter time delay with no obvious sense of delay in the human body, which is expected to be an efﬁcient control for dexterous prostheses.


Introduction
A myoelectric prosthesis is a type of bionic prosthesis that uses surface electromyography (sEMG) signals from the body to direct mechanical components to make appropriate movements [1]. The control technique using sEMG as a signal source is the most similar to people's perceptions of the prosthesis because sEMG signals can reflect neural activity and contain information about muscle activities related to limb movements. Most commercial upper-limb prostheses are often controlled by a "mode switch" when they have two or more degrees of freedom [2]. This method will significantly raise the complexity of multi-degree-of-freedom prosthesis control and the switching burden on amputee patients and lower the directness and real time of prosthesis control as the degree of freedom of the prosthesis increases.
A prosthetic control method based on sEMG signal pattern identification has been proposed to accomplish quick and intuitive prosthetic control with the further development of artificial intelligence algorithms and high-performance microprocessors. A subset of artificial intelligence known as "pattern recognition" assigns classifications to data based on their features or patterns [3]. Since the pattern recognition method extensively uses bioelectrical signals, which contain data such as the length of muscle contractions and the size of the sEMG signals of muscle contractions, the realization of multi-free prosthesis motion control becomes possible. Achieving multi-free prosthesis movement control not only improves the quality of life of persons with physical disabilities but also encourages their reintegration into society. The stability and viability of the pattern recognition algorithm, as the center of the EMG prosthesis control method, will significantly affect how well the prosthesis works.
Pattern recognition algorithms based on traditional classifiers and deep learning algorithms are two categories into which the existing sEMG-signal-based pattern recognition methods fall. "Traditional classifiers" means traditional classification methods based on machine learning. They are distinguished from the classification methods based on deep learning. The sEMG control based on pattern recognition was initially proposed by Finley's group, in 1967 [4]. They claimed that sEMG signals could be mapped to specific motion modes using the classifier created in advance, resulting in the generation of prosthetic motion control orders. In 2002, Hudgins et al. [5] developed an intelligent control strategy for regulating multifunctional prostheses based on sEMG signals and demonstrated that sEMG signals exhibited certainty at the beginning stage of a muscle contraction and can be used to differentiate between different forms of limb movements. The proposed method has attracted the attention of a large number of researchers and is considered a promising intelligent prosthetic control strategy. By adding a sliding window to the sEMG signal in 2003, Englehart et al. [6] divided the existing continuous signal into independent time windows and applied a linear discriminant analysis (LDA) classifier for subsequent motions. Since then, the classification of motion pattern recognition based on the sEMG signal, to guarantee continual action directives from the controller, has been widely used in the rehabilitation of patients with upper-limb dysfunction due to amputation. In order to categorize 12 different forearm movements in five healthy participants, Jianwei Liu et al. [7] employed an autoregressive power spectrum (ARPS) as a feature set and LDA as a classifier, with an average error rate of 5.00%, superior to other feature sets in performance. There have been extensive studies on classifiers other than LDA as action classification techniques. Support vector machines (SVMs) were used by Futamat et al. to categorize forearm movement patterns [8], attaining an accuracy rate of 95.59%. They gathered sEMG signals under 10 different forearm movements from the patients' four forearm muscles. On 15 different finger movements performed by 15 people, Purushothaman and Vikas [9] examined the categorization effects of SVM, LDA, and naive Bayes (NB) classifiers. Geethanjali [10] compared the classification performances of an SVM, LDA, and artificial neural network for six different hand movements of the subjects, indicating that the mean absolute value (MAV), waveform length (WL), zero crossing (ZC), slope sign change (SSC), and fourth-order auto regression (AR) coefficients were extracted as feature sets. As a classifier, the linear SVM obtained the best classification results, reaching 92.8% accuracy. Amirabdollahian et al. examined the changes in classification accuracy under various combinations of SVM nuclei and electrodes while collecting the sEMG signals of 26 people during a series of gesture movements while wearing Myo arm rings, indicating that the linear kernel has an advantage over the polynomial and radial basis functions and that the linear kernel and eight-channel electrode combination can achieve the best accuracy of 94.9% [11]. Caesarendra W et al. used ANFIS-based learning to classify the reduced features of five finger gestures [12]. Additionally, several studies applied unsupervised fuzzy clustering, K nearest neighbor (KNN), and other pattern recognition techniques to achieve multi-motion pattern identification [13][14][15].
The features manually retrieved are crucial to classifying the performances of standardmachine-learning-based sEMG pattern recognition. Traditional machine learning techniques cannot efficiently categorize and train on abstract, noisy, and high-dimensional data, and it is a great challenge to achieve high classification accuracy for unprocessed raw sEMG signals [16]. Deep learning has made significant progress with the advancement of artificial intelligence technologies in the areas of image categorization and motion pattern recognition and possesses powerful performance learning abilities. End-to-end sEMG pattern recognition can be accomplished without the laborious feature extraction stage of classical machine learning owing to its capacity to autonomously learn features at various degrees of abstraction from input samples.
The convolutional neural network (CNN) and recurrent neural network (RNN), as two currently popular deep learning algorithm models, have been employed extensively in sEMG pattern detection. A network model based on the CNN architecture was adopted by several teams [17][18][19][20][21] in an effort to boost CNN's classification performance. Included in this are the creation of an instantaneous EMG image from the original sEMG signal [18], the use of the delayed sEMG spectrum as an input [19], the multi-stream decomposition stage and fusion stage to train the CNN model [20], the feature extraction method based on CNN (CNNFeat) [21], etc. A neural network model known as an RNN model used dynamic internal state changes to interpret time information [22]. A classification model based on the RNN has been proposed by some teams [23][24][25][26][27], and the research findings demonstrated that the RNN model performs consistently in the delay problem. Additionally, LSTM can address the issues of gradient disappearance and long-term reliance as an upgraded RNN model [26]. A CNN-RNN composite neural network structure was presented by some other teams [28][29][30][31][32] that can concurrently capture spatial and temporal information from sEMG grayscale pictures. The results show that the deep design can improve the classification's accuracy and robustness.
To sum up, most of the current research is still focused on the accuracy of pattern recognition in complex situations. Up to now, only a few teams [33][34][35] have conducted experiments to verify the real-time performance of a gesture recognition model, and the delay is too long to meet the needs of practical use. Although accuracy is the main research direction of pattern recognition algorithms, real-time performance is also one of the important factors in whether pattern recognition algorithms can be applied to upperlimb prostheses. The CNN combines feature extraction and classification and possesses the ability to achieve the optimal features and classifier parameters based on the original data. A 1D-CNN can effectively train the limited one-dimensional data in the data set and has better real-time performance, while the RNN has reliable performance in dealing with timing problems. Both of them could improve the real-time performance of the model on the basis of maintaining considerable accuracy. Therefore, we combined the advantages of a 1D-CNN and an RNN to propose a one-dimensional convolutional recurrent network classification model (1D-CNN-RNN) suitable for motion pattern recognition based on sEMG signals. Since there is no consistent conclusion on whether the accuracy of offline action recognition can directly reflect the real-time performance of a pattern recognition system, we also designed a real-time experiment to verify the performance of the model.

Signal Acquisition
The multi-sEMG signals were collected using the commercial wearable gForce EMG Armband, developed by gFrocePro+ (OYMotion Technologies Co., Ltd., Shanghai, China), which consists of eight dry electrodes with a sampling frequency of 1000 Hz and connects wirelessly to a recording computer via Bluetooth. The gForcePro+ armband was worn on the forearm, 2~3 cm from the elbow crease and the distal end of the olecranon process of the ulna, and covered the extensor carpi radialis, extensor digitalis, extensor carpi ulnar, and flexor digitalis superficialis of the subject's forearm, as shown in Figure 1a,b. During the experiments, participants moved their upper limbs in respective ways as directed by the guiding video while maintaining a modest level of muscular contraction. When it was indicated in the guiding film to return to normal condition, the subjects relaxed upper limbs as rest position. Figure 1c,d depicts 20 movement patterns designed for this study, which include typical motions of fundamental finger and wrist movements.  Figure 1c,d depicts 20 movement patterns designed for this study, which include typical motions of fundamental finger and wrist movements.

Offline Experiment
This study recruited 23 healthy subjects, with an average age of 22.78 ± 1.70 years, comprising 15 men and 8 women, 21 right-handed and 2 left-handed participants. Exclusion criteria were any neurological pathologies or musculoskeletal complaints interfering with study outcomes. Each subject received written informed consent before the experiment. Each participant conducted the offline experiments using their right hand. Each movement was performed for three seconds and repeated ten times. After 3 s rest, the exercise was repeated, with a 5 min break in between each action.    Figure 1c,d depicts 20 movement patterns designed for this study, which include typical motions of fundamental finger and wrist movements.

Offline Experiment
This study recruited 23 healthy subjects, with an average age of 22.78 ± 1.70 years, comprising 15 men and 8 women, 21 right-handed and 2 left-handed participants. Exclusion criteria were any neurological pathologies or musculoskeletal complaints interfering with study outcomes. Each subject received written informed consent before the experiment. Each participant conducted the offline experiments using their right hand. Each movement was performed for three seconds and repeated ten times. After 3 s rest, the exercise was repeated, with a 5 min break in between each action.

Offline Experiment
This study recruited 23 healthy subjects, with an average age of 22.78 ± 1.70 years, comprising 15 men and 8 women, 21 right-handed and 2 left-handed participants. Exclusion criteria were any neurological pathologies or musculoskeletal complaints interfering with study outcomes. Each subject received written informed consent before the experiment. Each participant conducted the offline experiments using their right hand. Each movement was performed for three seconds and repeated ten times. After 3 s rest, the exercise was repeated, with a 5 min break in between each action.

Online Experiment
The online sEMG signal acquisition and assessment experiments were implemented through our self-developed sEMG-specific movement recognition system, which is capable of performing sEMG signal acquisition, display and processing, and offline and online multi-movement performance recognition. Ten other healthy subjects, with an average age of 24.4 ± 1.51 years old, including 6 males and 4 females, 6 right-handed and 4 lefthanded, were recruited to participate in this experiment to demonstrate the practicality of online recognition. Before the experiment, all participants were informed and signed the consent form.
The subjects conducted two experiments. The offline and online pattern recognition protocols and the schematic diagram of the overall experiments are displayed in Figure 3. A 1D convolutional recurrent neural network classification model for recognizing online finger and wrist movements in real time was proposed, as depicted in Section 2.2, which could effectively combine the advantages of a convolutional neural network and a recurrent neural network. The 1D-CNN-RNN, as the initial model, was trained in the offline model before being applied to the recognition experiments. In order to optimize the classification, the classification models used in the online experiment were all based on the previous generation models, and the data from the offline experiment were used for further iterative training. The online sEMG signal acquisition and assessment experiments were implemented through our self-developed sEMG-specific movement recognition system, which is capable of performing sEMG signal acquisition, display and processing, and offline and online multi-movement performance recognition. Ten other healthy subjects, with an average age of 24.4 ± 1.51 years old, including 6 males and 4 females, 6 right-handed and 4 lefthanded, were recruited to participate in this experiment to demonstrate the practicality of online recognition. Before the experiment, all participants were informed and signed the consent form.
The subjects conducted two experiments. The offline and online pattern recognition protocols and the schematic diagram of the overall experiments are displayed in Figure 3. A 1D convolutional recurrent neural network classification model for recognizing online finger and wrist movements in real time was proposed, as depicted in Section 2.2, which could effectively combine the advantages of a convolutional neural network and a recurrent neural network. The 1D-CNN-RNN, as the initial model, was trained in the offline model before being applied to the recognition experiments. In order to optimize the classification, the classification models used in the online experiment were all based on the previous generation models, and the data from the offline experiment were used for further iterative training. During this experiment, subjects performed corresponding exercises according to the guidance video displayed in the system, each exercise was repeated 10 times, each exercise lasted 3 s, and the next exercise was continued after 3 s of relaxation. There was a 5 min break between exercises. At the same time, the experiment operator recorded the corresponding results displayed by the system. At the end of the experiment, the ratio of correct times to total times of results and the average delay of real-time pattern recognition were counted.

Preprocessing
Data preprocessing mainly includes filtering and noise reduction, standardization, and active segment extraction. The motion artifact and electrical interference brought by cables were removed using a third-order 10 Hz Butterworth high-pass filter in accordance with the spectrum energy distribution of the sEMG signal.
Due to factors such as different anatomical tissues and physiological conditions of subjects, multi-channel sEMG signals showed obvious differences. Therefore, standardized processing was adopted to transform sEMG signals from different subjects to reduce the impact of individual variations on pattern identification and categorization. The standardization method used in this article is Z-Score standardization. This is a frequently used standardized technique. The transformation formula is (x − µ)/σ. In this formula, µ is Mean, and σ is Standard Deviation. It converts data of different orders of magnitude into unitless values. During this experiment, subjects performed corresponding exercises according to the guidance video displayed in the system, each exercise was repeated 10 times, each exercise lasted 3 s, and the next exercise was continued after 3 s of relaxation. There was a 5 min break between exercises. At the same time, the experiment operator recorded the corresponding results displayed by the system. At the end of the experiment, the ratio of correct times to total times of results and the average delay of real-time pattern recognition were counted.

Preprocessing
Data preprocessing mainly includes filtering and noise reduction, standardization, and active segment extraction. The motion artifact and electrical interference brought by cables were removed using a third-order 10 Hz Butterworth high-pass filter in accordance with the spectrum energy distribution of the sEMG signal.
Due to factors such as different anatomical tissues and physiological conditions of subjects, multi-channel sEMG signals showed obvious differences. Therefore, standardized processing was adopted to transform sEMG signals from different subjects to reduce the impact of individual variations on pattern identification and categorization. The standardization method used in this article is Z-Score standardization. This is a frequently used standardized technique. The transformation formula is (x − µ)/σ. In this formula, µ is Mean, and σ is Standard Deviation. It converts data of different orders of magnitude into unitless values.
The active segment is often extracted by the threshold recognition approach, which recorded any instantaneous values in the smooth signal that are higher than the threshold. Nevertheless, utilizing a certain threshold as the active segment detection standard error is significant for many sEMG signals. In order to identify the active portion of the EMG signal, this study used the adaptive double-threshold approach. The method of calculation is as follows: (1) The single-channel EMG data s k (i) were differentially processed. The instantaneous average energy sequence E was derived from the mean square energy of eight-channel EMG data (N = 8): (2) The sliding window's mean energy S for sequence E with a window length of 64 ms was determined: (3) Double thresholds Th 1 and Th 2 were selected adaptively according to median and variance: (4) The data segment satisfying Th 2 < S < Th 1 was recorded as active segment, and its beginning and ending locations were established.

Segmentation
In this study, active data were extracted using a sliding window with a 64 ms window length and a step size of 64 ms, and the segmented data were labeled. It is worth mentioning that the idle segment data were designated as "resting".
The training set: test set ratio of 6:4 was used to randomly partition the data set. The classification model was trained using the training set as input, and its effectiveness was then confirmed using the test set.

One-Dimensional CNN-RNN
CNN is specialized to processing data with the same grid structure, such as multidimensional time series and image data. It is a groundbreaking model in the field of deep learning, and it excels in a wide range of applications. CNN models have been the de facto standard for image classification problems for the past decade, a process known as feature learning. The model accepts two-dimensional input data in its internal form. Similar steps can be taken for one-dimensional data sequences. One-dimensional CNNs use feature learning on raw data rather than data feature engineering. Figure 4 depicts the 1D-CNN fundamental design.
RNN is a type of backpropagation neural network model for processing sequence data that can change its internal state via recurrent connections. This property allows RNN to perform effectively when processing time-dependent signals such as speech and text. Figure 5 depicts the evolution of RNN structural units throughout time.  RNN is a type of backpropagation neural network model for processing sequence data that can change its internal state via recurrent connections. This property allows RNN to perform effectively when processing time-dependent signals such as speech and text. Figure 5 depicts the evolution of RNN structural units throughout time. where is the input vector, is the value of the hidden layer, is the value of the output layer, is the weight matrix from the input layer to the hidden layer, represents the weight matrix from the hidden layer to the output layer, is the value of a hidden layer instant as the weight matrix input at that time. As the RNN structure unit expands over time, the value of is related to the preceding moment's input and hidden layer weights.
When long-term memory is required, the solution of the RNN is related to the first n times.
= ( * + * + * + ⋯ + * ) When is increased, the model computation grows exponentially, making the model training time much longer. In addition, when dealing with a long-term problem, the data lose some information at each step of the RNN traversal, and thus the distant information caused by the disappearance of the gradient has very little impact on that instant, so the RNN state has little trace of the initial input. As a result, the standard RNN model is unsuitable for calculating long-term memory. However, as a variation of RNN, LSTM has a significant advantage in handling this problem.  RNN is a type of backpropagation neural network model for processing sequence data that can change its internal state via recurrent connections. This property allows RNN to perform effectively when processing time-dependent signals such as speech and text. Figure 5 depicts the evolution of RNN structural units throughout time. where is the input vector, is the value of the hidden layer, is the value of the output layer, is the weight matrix from the input layer to the hidden layer, represents the weight matrix from the hidden layer to the output layer, is the value of a hidden layer instant as the weight matrix input at that time. As the RNN structure unit expands over time, the value of is related to the preceding moment's input and hidden layer weights.
When long-term memory is required, the solution of the RNN is related to the first n times.
When is increased, the model computation grows exponentially, making the model training time much longer. In addition, when dealing with a long-term problem, the data lose some information at each step of the RNN traversal, and thus the distant information caused by the disappearance of the gradient has very little impact on that instant, so the RNN state has little trace of the initial input. As a result, the standard RNN model is unsuitable for calculating long-term memory. However, as a variation of RNN, LSTM has a significant advantage in handling this problem. where x is the input vector, s is the value of the hidden layer, o is the value of the output layer, U is the weight matrix from the input layer to the hidden layer, V represents the weight matrix from the hidden layer to the output layer, W is the value of a hidden layer instant as the weight matrix input at that time. As the RNN structure unit expands over time, the value of s t is related to the preceding moment's input and hidden layer weights.
When long-term memory is required, the solution of the RNN is related to the first n times.
When n is increased, the model computation grows exponentially, making the model training time much longer. In addition, when dealing with a long-term problem, the data lose some information at each step of the RNN traversal, and thus the distant information caused by the disappearance of the gradient has very little impact on that instant, so the RNN state has little trace of the initial input. As a result, the standard RNN model is unsuitable for calculating long-term memory. However, as a variation of RNN, LSTM has a significant advantage in handling this problem.
where the vector h t is short-term state, output gate).
where the vector ℎ is short-term state, is long-term state, is the input, , , , and are the input gate, main layer, forgetting gate, and output gate, r tively, is the weight matrix connected to , is bias.  The first module consists of two layers with 64 one-dimensional convo layer units with ReLU as the activation function, a layer of batch normalization, layer of maximum pooling with a window length of two. Similar to the first modu second module has two layers of 128 one-dimensional convolution layer units using as the activation function, a layer of batch normalization, a layer of maximum po with a window length of 2, and a dropout layer with a dropout rate of 0.2. The third ule is composed of two LSTM cells with TANH as the activation function and two dr layers with dropout rates of 0.2 and 0.5. The final module contains the Dense layer, F layer, and Softmax layer. The fundamental idea behind this architecture is to merge and RNN, take full advantage of CNN's benefits in feature extraction and multid sional timing signal processing, and add an LSTM structure for time memory to ove CNN's shortcomings in time delay. c t is long-term state, x t is the input, i, g, f , and o are the input gate, main layer, forgetting gate, and output gate, respectively, W is the weight matrix connected to x t , b is bias. Figure 7 depicts the one-dimensional convolutional recurrent neural network model (1D-CNN-RNN) created for this investigation. The neural network model consists of four modules. The first module consists of two layers with 64 one-dimensional convolution layer units with ReLU as the activation function, a layer of batch normalization, and a layer of maximum pooling with a window length of two. Similar to the first module, the second module has two layers of 128 one-dimensional convolution layer units using ReLU as the activation function, a layer of batch normalization, a layer of maximum pooling with a window length of 2, and a dropout layer with a dropout rate of 0.2. The third module is composed of two LSTM cells with TANH as the activation function and two dropout layers with dropout rates of 0.2 and 0.5. The final module contains the Dense layer, Flatten layer, and Softmax layer. The fundamental idea behind this architecture is to merge CNN and RNN, take full advantage of CNN's benefits in feature extraction and multidimensional timing signal processing, and add an LSTM structure for time memory to overcome CNN's shortcomings in time delay. The classification results of the model are contrasted with those of the CNN a LSTM in order to examine the viability and advantages of the model in hand moti recognition based on sEMG. The same data set was used to train all three models. Ea model training cycle's epoch was 30, the batch size was 128, Adam's initial learning r was set to 0.001, and the cross-entropy loss function served as the model's loss function

Evaluation Metrics
Recall, accuracy, precision, and F1 score were the quantitative evaluation indices e ployed in this study to confirm the model's performance (accuracy and recall were co sidered at the same time to achieve the maximum and a balance).

Offline Result Analysis
The CNN, LSTM, and 1D-CNN-RNN were applied to optimize the performance classification methods. The comparison performances of the evaluation indices of t training and test from 23 subjects based on the above three models are provided in Ta 1. Although the training process of the CNN took less time than the LSTM and 1D-CN RNN models, the test set's classification accuracy rate (85.43%) and loss value (0.4446) f significantly short of the pattern recognition system's accuracy standards. The classifi The classification results of the model are contrasted with those of the CNN and LSTM in order to examine the viability and advantages of the model in hand motion recognition based on sEMG. The same data set was used to train all three models. Each model training cycle's epoch was 30, the batch size was 128, Adam's initial learning rate was set to 0.001, and the cross-entropy loss function served as the model's loss function.

Evaluation Metrics
Recall, accuracy, precision, and F1 score were the quantitative evaluation indices employed in this study to confirm the model's performance (accuracy and recall were considered at the same time to achieve the maximum and a balance).

Offline Result Analysis
The CNN, LSTM, and 1D-CNN-RNN were applied to optimize the performance of classification methods. The comparison performances of the evaluation indices of the training and test from 23 subjects based on the above three models are provided in Table 1.
Although the training process of the CNN took less time than the LSTM and 1D-CNN-RNN models, the test set's classification accuracy rate (85.43%) and loss value (0.4446) fell significantly short of the pattern recognition system's accuracy standards. The classification accuracy of the LSTM model was significantly improved compared to the CNN model, but it consumed the longest training time, as long as 3.33 times. The test set's accuracy of the CNN-RNN for the pattern recognition of 20 motion modes was 98.88%, outperforming the separate CNN or LSTM model. Moreover, the training time of this model was only 41% of that of the LSTM model. Consequently, the 1D-CNN-RNN performed best in recall rate, accuracy, and f1-score, with 98.88%, 98.96%, and 0.9896, respectively. As can be seen from Figure 8, compared with the CNN and LSTM models, the 1D-CNN-RNN model presented a faster convergence rate and remained stable after a shorter training period. The loss function of the training set and test set of this model also had a significant overall fitting degree. The CNN model, however, did not reach the convergence state within the same training rounds as the 1D-CNN-RNN. For the LSTM model, the loss function curve displayed an overall declining tendency. However, the loss value initially increased because of the challenging samples used in the training process, but after numerous training sessions, it reduced to a specific range and maintained oscillations.
was only 41% of that of the LSTM model. Consequently, the 1D-CNN-RNN perform best in recall rate, accuracy, and f1-score, with 98.88%, 98.96%, and 0.9896, respectivel As can be seen from Figure 8, compared with the CNN and LSTM models, the CNN-RNN model presented a faster convergence rate and remained stable after a sho training period. The loss function of the training set and test set of this model also ha significant overall fitting degree. The CNN model, however, did not reach the conv gence state within the same training rounds as the 1D-CNN-RNN. For the LSTM mo the loss function curve displayed an overall declining tendency. However, the loss va initially increased because of the challenging samples used in the training process, after numerous training sessions, it reduced to a specific range and maintained osci tions.
(a) Loss function (b) Accuracy The CNN-RNN model performs admirably when it comes to the recognition of co parable movements, and the recognition accuracy for 20 motion patterns reaches 97% higher; the confusion matrix is depicted in Figure 9. As we are aware from the above co parison, the CNN-RNN model achieves the best pattern recognition performance un the same pretreatment and the same neural network hyper parameter configuration, the CNN model performs the worst. The advantage of employing a one-dimensio CNN-RNN neural network model over the LSTM approach is that the convolution la placed before the LSTM unit reduces the input's dimension, minimizes computation, boosts efficiency. Convolutional layer feature extraction benefits from the batch norm zation implemented in the CNN-RNN model, and the additional dropout layer preve overfitting, making the model structure more robust. The CNN-RNN model performs admirably when it comes to the recognition of comparable movements, and the recognition accuracy for 20 motion patterns reaches 97% or higher; the confusion matrix is depicted in Figure 9. As we are aware from the above comparison, the CNN-RNN model achieves the best pattern recognition performance under the same pretreatment and the same neural network hyper parameter configuration, and the CNN model performs the worst. The advantage of employing a one-dimensional CNN-RNN neural network model over the LSTM approach is that the convolution layer placed before the LSTM unit reduces the input's dimension, minimizes computation, and boosts efficiency. Convolutional layer feature extraction benefits from the batch normalization implemented in the CNN-RNN model, and the additional dropout layer prevents overfitting, making the model structure more robust. Electronics 2023, 12, x FOR PEER REVIEW 11 of 16 Figure 9. CNN-RNN model pattern recognition confusion matrix.

Online Result Analysis
Before we performed the online classification outcomes of the 1D-CNN-RNN model, we trained the 10 subjects offline. According to Table 2, the offline recognition accuracy of the CNN-RNN model achieves more than 98%, and the loss value, recall rate, and accuracy rate all reach good performances. The results are in line with the 1D-CNN-RNN training results obtained from the offline experiments, further demonstrating the practicability of using data from newly added subjects to train the original neural network model, which cuts down offline training time and increases the efficiency of online recognition. The outcomes of the real-time recognition of the 20 motion patterns of 10 participants are displayed in the histograms of Figures 10 and 11. We can see that the average recognition accuracy of the ten subjects is 91% ± 5%. The real-time recognition accuracy of the

Online Result Analysis
Before we performed the online classification outcomes of the 1D-CNN-RNN model, we trained the 10 subjects offline. According to Table 2, the offline recognition accuracy of the CNN-RNN model achieves more than 98%, and the loss value, recall rate, and accuracy rate all reach good performances. The results are in line with the 1D-CNN-RNN training results obtained from the offline experiments, further demonstrating the practicability of using data from newly added subjects to train the original neural network model, which cuts down offline training time and increases the efficiency of online recognition. The outcomes of the real-time recognition of the 20 motion patterns of 10 participants are displayed in the histograms of Figures 10 and 11. We can see that the average recognition accuracy of the ten subjects is 91% ± 5%. The real-time recognition accuracy of the following 12 movements is above 90%: rest, index and middle finger extension, five-finger grasp, four-finger pinch, four-finger stretch, fist clench, five-finger stretch, five-finger pinch, external wrist rotation, wrist flexion, and wrist extension. We speculated that the above movements had better online recognition accuracy because these motions' properties were evident and muscular force application was not easy to misunderstand. In terms of all related movements of the thumbs, thumb lateral adduction, thumb extension, index finger pinching, and three-finger pinching have not yet been determined. This might be a result of the fact that the thumb-related movements described above have a quite high degree of similarity. When the sEMG signal's useful information was insufficient, it was prone to make motion pattern identification mistakes.
Electronics 2023, 12, x FOR PEER REVIEW 12 of 16 following 12 movements is above 90%: rest, index and middle finger extension, five-finger grasp, four-finger pinch, four-finger stretch, fist clench, five-finger stretch, five-finger pinch, external wrist rotation, wrist flexion, and wrist extension. We speculated that the above movements had better online recognition accuracy because these motions' properties were evident and muscular force application was not easy to misunderstand. In terms of all related movements of the thumbs, thumb lateral adduction, thumb extension, index finger pinching, and three-finger pinching have not yet been determined. This might be a result of the fact that the thumb-related movements described above have a quite high degree of similarity. When the sEMG signal's useful information was insufficient, it was prone to make motion pattern identification mistakes. Figure 10. The recognition accuracy of each action in online recognition (the error bar represents the standard deviation). Figure 11. The identification accuracy of each subject in online identification (S1-S10 are subject numbers; the error bar represents the standard deviation).

Discussion
The intelligent bionic manipulator based on sEMG signal regulation currently offers a wide range of potential applications. The focus of research is on how to accurately extract motion information from sEMG and perform motion recognition. This study presented a 1D-CNN-RNN model to perform quick and precise multi-motion pattern recognition based on the "end-to-end" features of deep learning models. Pattern recognition was performed on 23 subjects' sEMG signals in 20 forearm motion modes. The final evaluation results showed that the designed 1D-CNN-RNN model presented excellent perfor- Figure 10. The recognition accuracy of each action in online recognition (the error bar represents the standard deviation).
Electronics 2023, 12, x FOR PEER REVIEW 12 of 16 following 12 movements is above 90%: rest, index and middle finger extension, five-finger grasp, four-finger pinch, four-finger stretch, fist clench, five-finger stretch, five-finger pinch, external wrist rotation, wrist flexion, and wrist extension. We speculated that the above movements had better online recognition accuracy because these motions' properties were evident and muscular force application was not easy to misunderstand. In terms of all related movements of the thumbs, thumb lateral adduction, thumb extension, index finger pinching, and three-finger pinching have not yet been determined. This might be a result of the fact that the thumb-related movements described above have a quite high degree of similarity. When the sEMG signal's useful information was insufficient, it was prone to make motion pattern identification mistakes. Figure 10. The recognition accuracy of each action in online recognition (the error bar represents the standard deviation). Figure 11. The identification accuracy of each subject in online identification (S1-S10 are subject numbers; the error bar represents the standard deviation).

Discussion
The intelligent bionic manipulator based on sEMG signal regulation currently offers a wide range of potential applications. The focus of research is on how to accurately extract motion information from sEMG and perform motion recognition. This study presented a 1D-CNN-RNN model to perform quick and precise multi-motion pattern recognition based on the "end-to-end" features of deep learning models. Pattern recognition was performed on 23 subjects' sEMG signals in 20 forearm motion modes. The final evaluation results showed that the designed 1D-CNN-RNN model presented excellent perfor- Figure 11. The identification accuracy of each subject in online identification (S1-S10 are subject numbers; the error bar represents the standard deviation).

Discussion
The intelligent bionic manipulator based on sEMG signal regulation currently offers a wide range of potential applications. The focus of research is on how to accurately extract motion information from sEMG and perform motion recognition. This study presented a 1D-CNN-RNN model to perform quick and precise multi-motion pattern recognition based on the "end-to-end" features of deep learning models. Pattern recognition was performed on 23 subjects' sEMG signals in 20 forearm motion modes. The final evaluation results showed that the designed 1D-CNN-RNN model presented excellent performances, its accuracy reached 98.96%, and the recognition effect of similar actions was better compared with other common neural networks. The outcomes of the online test demonstrated that the 1D-CNN-RNN model put forth in this paper performs admirably in the recognition accuracy of the majority of activities, with an average recognition accuracy of 91%.
In terms of time delay in online recognition, this model performed quite well. Table 3 shows the model performances in different studies. Among the models with real-time performance stated, the sliding windows of [33,34] are too long, which may significantly reduce the real-time classification performance of the model. According to previous studies' reports [36], the human body can hardly feel the time delay of 0-300 ms while the typical latency of this model is 153 ms. Hence, the 1D-CNN-RNN proposed in this study offered better real-time performance and high accuracy when compared to other studies. Additionally, the sliding window length partly reflects the real-time performances of [33][34][35]. The equipment for collecting sEMG signals is also one of the factors weighing against the delay of real-time recognition. Therefore, more research and debates are required to determine whether the delay of real-time recognition may be decreased by enhancing the hardware quality. Additionally, different handedness tends to have dissimilar muscular force patterns of the upper limb. We wondered how handedness impacts recognition accuracy under the online movements' classification, so we compared the recognition results under different handedness. As shown in Figure 11, the first subject (S1), as a left-hander, has the lowest average recognition accuracy at just 79%. This is probably because the offline recognition experiment was trained by the data from a right-hander, and the left-hand data from the first subject were utilized to train the online experimental model. Therefore, the recognition accuracy of the first subject was less than 85%. However, the accuracy of real-time recognition tended to rise with the number of subjects, and the accuracy of left-handed individuals' subsequent movements is much higher than that of the initial subject, such as the second, third, and seventh subjects (S2, S3, S7). Their accuracies were gradually improved, and the accuracies of S3 and S7 had no significant difference from that of right-handed subjects. The results indicate that the accuracy of neural network pattern recognition is higher with more training data.
The multi-movement classification outcomes in this study revealed that the identification accuracies of thumb upward, index finger pinching, three-finger pinching, and lateral adduction thumb movements were generally poor. This may be because the aforementioned actions involve the thumb and corresponding hand muscle, which thereby produced similar sEMG signals and made it easier for them to flood in the remaining fingers' signals. The next stage will involve paying more attention to the specifics of sEMG signals and separating some unique activities even more. In order to address the shortcomings of using only sEMG signals, it will also be considered to incorporate other signals, considering the advantages of joint angle, acceleration, signal acquisition location [38], and other information [39] in motion pattern recognition. What is more, this proposed 1D-CNN-RNN model is suitable for the motion control of intelligent EMG prostheses, whereas the sEMG signals used for pattern recognition in this research were derived from healthy subjects. sEMG motion control and intensity in amputees differ from those of healthy individuals, which creates a challenge in that the present models are based on data from healthy people and do not always match amputees. In order to enhance the pattern recognition algorithm framework further and perform adaptive adjustments in accordance with the actual scenario of amputees, the sEMG signal from amputees will be applied in a subsequent study.

Conclusions
This study comes to a significant conclusion that the proposed 1D-CNN-RNN model for motion pattern recognition accomplished good classification performance based on multi-channel sEMG for 20 independent and combined finger and wrist movements. The average recognition accuracy of the 1D-CNN-RNN model designed reaches 98.96% in offline recognition, which is significantly higher than that of the CNN and LSTM (85.43% and 96.88%, respectively, p < 0.01), and the model achieves better comprehensive performance.
The key finding of this study is that the 1D-CNN-RNN model performs better in real-time recognition. This may be due to the 1D-CNN and LSTM components of the model possessing powerful capabilities of processing time-series data, as well as the addition of batch normalization and dropout layers that promote quick convergence and avoid overfitting.
Real-time pattern recognition experiments were carried out in this study for ten people to examine the real-time performance of the developed 1D-CNN. The average real-time recognition rate is 91% ± 5%, and the average delay is 153 ms, which is able to meet the needs of the real-time control of intelligent prosthetics. The 1D-CNN-RNN classification model proposed had significant advantages in real-time recognition accuracy, and the average time delay is expected to provide an efficient control method for EMG prosthetic hands because the time range had no obvious sense of delay in the human body [36]. Furthermore, the stability and accuracy of real-time recognition will be included in future studies.

Institutional Review Board Statement:
The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Ethics Committee of the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences (SIAT-IRB-221115-H0626).
Informed Consent Statement: All subjects gave their informed consent for inclusion before they participated in the study. Written informed consent has been obtained from the patient(s) to publish this paper.