Robust motion estimation with user-independent sEMG features extracted by correlated components analysis

Motion estimation from surface electromyogram (sEMG) signals has been studied extensively over the past decades. Nevertheless, it is challenging for novel subjects to adapt to a trained estimation model since sEMG signals inherently contain user-dependent features that interfere with the estimation model and reduce the estimation accuracy. To achieve accurate motion estimation, a strategy of correlated components analysis-based random forest regressor (CorrCA-RFG) was proposed. The proposed CorrCA-RFG firstly uses CorrCA to extract user-independent features related to motion among multiple subjects, and obtain the projection vectors from sEMG data to the motion-dependent feature space. Then, the RFG is trained by the user-independent sEMG features and establishes the estimation model. To validate the effectiveness of the proposed CorrCA-RFG, this strategy was tested on a public dataset and an experimental study and compared to three methods, namely random forest regressor (RFG), canonical components analysis-based random forest regressor (CCA-RFG), and a convolutional neural network (CNN). For both cases, the estimation performance of the CorrCA-RFG outperformed the other three methods. These results demonstrate that the proposed CorrCA-RFG enables robust motion estimation by extracting user-independent sEMG features.


Introduction
The surface electromyogram (sEMG) signals detected by surface electrodes are the representation of the electric potential field generated by the muscle fiber contraction, 1 which is non-invasive and contains the essential information on human motion. 2,3][6] However, due to the user-dependent nature of sEMG signals mainly affected by factors of the subcutaneous tissue and the physiological cross-sectional area of the muscle 7,8 the measured sEMG signals are individual varying differently from different users even performing the same motion.The user-dependent nature of sEMG leads to user-dependent estimation models [9][10][11] which have less generalization and should be retrained for a new user.][14][15] To address these issues, many efforts have been made to improve the generalization of the estimation models in cross users.Firstly, some studies extended the size and diversity (more training users) of the training data to enhance the generalization ability of machine learning models, 16 such as random forest regression (RFG) and support vector regression (SVR), which is costly.In these methods, the generalization performance heavily depended on the training dataset.Secondly, deep learning and transfer learning methods have been implemented to estimate human motion.Deep learning methods have the ability to automatically learn features that can share similar distributions across various subjects. 17,18ang et al. 19 proposed a novel convolutional neural network (CNN) structure with the capacity of generalization to estimate wrist movement, in which the CNN contains a large number of parameters.Bao et al. 20 designed a two-stream CNN with a complexity structure for supervised domain adaptation to reduce the domain shift effect.Transfer learning aims to explore the knowledge from the source domain and use the knowledge to a target domain. 21To estimate the elbow torque based on sEMG signals, Jiang et al. 22 proposed a correlationbased data weighting scheme based on transfer learning, which is unsupervised in the modeling stage.However, the above methods require a large dataset or high computational cost.
To further improve the generalization of the estimation models, some researchers have begun to use userindependent sEMG features to establish the estimation models. 23,24The user-independent sEMG features were related to the motion with strong correlations among multiple users.Especially, they possibly share similar distributions across different subjects who perform the same motion.Therefore, the user-independent features are beneficial for improving the generalization.Some studies applied canonical components analysis (CCA) to extract the user-independent features and built generalized models based on these features to estimate the crossuser motion.Khushaba 25 applied CCA to project different users' data onto a unified-style space to overcome individual differences.Xue et al. 26 used CCA to extract the inherent user-independent properties of sEMG signals and applied optimal transport to further reduce the discrepancies between the transformed features from the training and testing set.In their studies, the CCA requires the canonical projection vectors to be orthogonal, which is not a reasonable assumption for the sEMG analysis.In addition, the estimation performance of CCA-based methods depends on the chosen expert set whose choosing standard has not been well defined and is usually given based on subjective experience.
In this study, we focused on extracting userindependent features without any expert sets and building a new estimation strategy for improving the performance of continuous motion estimation from sEMG signals.Since correlated components analysis (CorrCA) can help extract components that are maximally correlated among subjects, we firstly used the CorrCA to extract the user-independent sEMG features.In addition, RFG can achieve excellent prediction performance using a smaller dataset compared to the other machine learning methods. 27ased on the above advantages of the CorrCA and RFG, we propose a new estimation strategy: CorrCA-RFG.Firstly, a single set of linear projections are yielded from multiple subjects' sEMG signals using CorrCA.Projection vectors can transform all subjects' datasets into the same space to obtain user-independent features.Then these extracted features are applied to construct an RFG model to estimate human motion.To validate the effectiveness of the proposed method, we tested the CorrCA-RFG for estimating the knee angles in both a public dataset 28 and an own experimental study.The main contribution of this study is to provide a new strategy, namely CorrCA-RFG, from the viewpoint of extracting the user-independent sEMG features for the generalization of sEMG-based continuous motion estimation across multiple subjects.
The remainder of this paper is organized as follows.Section 2 presents the proposed CorrCA-RFG strategy.Section 3 describes the experimental study and the public dataset and presents results both on the experiment and public dataset.Section 4 and 5 give the discussion and conclusion of this study, respectively.

Methods
In this section, the CorrCA-RFG strategy is firstly described in details, as shown in Figure 1.Secondly, the processing of sEMG signals, sEMG features extraction and the performance measurement are introduced, respectively.

Correlated correlation analysis
The CorrCA tries to identify a single set of linear projections which can maximize the correlation among the N subjects.In other words, it can be employed to extract similar motion activity from sEMG signals in multiple subjects.Given data matrices, x j i 2 R D , where a right hand subscript i = 1, Á Á Á , N enumerates the subjects, a right hand superscript j = 1, Á Á Á , T enumerates the number of samples, and D denotes the number of feature dimensions.
The goal of CorrCA is to seek a projection vector w 2 R D ,6 where z j i denotes the j th projection of subject i.To seek the projection vector, inter-subject correlation (ISC) was required maximum and defined as, where z i = 1 T S T j = 1 z j i .Inserting (1) into (2) gives the following equation, where p b and p w represent the between-subject covariance and within-subject covariance, respectively.
where x i = 1 T S T j = 1 x j i .To maximize the ISC, differentiation of (3) with respect to the projection vector w and setting to zero yields the following eigenvalue equation, Supposing p w is invertible, w is the eigenvector of matrix p À1 w p b with eigenvalue r.The optimal projection can be obtained from the eigenvector of p À1 w p b with the largest eigenvalue.More details about CorrCA can be get in Parra et al. 29

CorrCA-RFG strategy
Taking advantage of the CorrCA for multiple subjects analysis, in this paper, we proposed the CorrCA-RFG strategy for sEMG-based motion estimation in crosssubject shown in Figure 1.Specifically, to reject noise and unimportant sEMG signals for each subject, 30 sEMG features were firstly extracted per each channel according to the gait cycle.Secondly, the features from different subjects in the same channel were concatenated into matrices, x k 2 R T3D3N , where T, D, and N represent the number of samples, feature dimensions, and the number of subjects, respectively.Thirdly, for each channel, based on these concatenated matrices, feature projection vector w k which maximized the correlation between subjects was obtained by CorrCA.The projection vectors can transform the sEMG features of all subjects into the same space.Finally, all of the transformed features were applied as training data to train an RFG model.For novel subjects, the proposed CorrCA-RFG strategy could improve the motion estimation performance without reconstructing the RFG model.Their features only need to be transformed into the same spaces as the training features by the obtained projection vectors.

Signal processing
A Butterworth band-pass filter (10-500 Hz) was applied to pre-process the sEMG signals before extracting sEMG features. 31,32Next, the pre-processed sEMG signals were normalized using the maximum amplitude.Then, an overlapping time window with a sliding window was applied to extract features from the normalized signals.To ensure the same number of feature samples over a gait cycle for all subjects, in this study, we proposed that sEMG features were extracted by segmenting with the gait cycle.Assuming TW and SW are time window length and sliding window length, respectively.
where GL denotes the length of one gait cycle.

Feature extraction
In each time window, 12 time-domain features and 2 frequency-domain features were extracted from each of the six sEMG channels.Specifically, the time-domain features consist of integrated EMG, mean absolute value, mean, root mean square, variance, kurtosis, skewness, zero crossing, slop sign change, and autoregressive model coefficients.The frequency-domain features include median frequency and mean frequency. 33For each channel, the extracted features were concatenated into a single vector as one sample, x ik , where i represents the i-th subject, and k denotes the kth channel.

Performance measurement
The normal root-mean-squared error (NRMSE) and correlation coefficient (CC) criteria were considered to assess the estimation performance.The NRMSE represents the errors between the measurement and the estimation.The CC represents the match between the measurement and the estimation.The values of NRMSE and CC are close to 0 and 1, respectively, which indicates good estimation performance.The NRMSE is defined as: where n denotes the total number of samples.y and ŷ represent the measured and estimated angles.y max and y min are the maximum and minimum values of the measured angles.
The CC is defined as: where C yŷ denotes the covariance between the measured and estimated angles.s y and s ŷ represent the standard deviation of measured and estimated angles respectively.

Experiments and results
This section presents the comparison results among the proposed CorrCA-RFG, conventional RFG, CCA-RFG, and CNN for knee angles estimation in both a public dataset and an experimental study.In the conventional RFG, the original sEMG features directly were fed into the RFG to train the model.In the CCA-RFG, for each subject, the CCA was used to extract the training features correlated with the expert features from the original sEMG features. 25Then, these extracted training features were fed into the RFG.A leave-one-subject-out cross-validation method was applied for cross validation.Specifically, one subject was picked as a novel subject for testing each time.Data from all remaining subjects were used for training.

Data acquisition
In

Estimation results on the public dataset
We first tested the proposed CorrCA-RFG in the public dataset and compared its results to the other three methods.The mean values of NRMSE and CC were presented in Table 1, which showed that: (1) Across all subjects, the NRMSE values of CorrCA-RFG were lower than the RFG on 7 out of the 10 subjects.Meanwhile, the CC values of CorrCA-RFG were higher than the RFG on 6 out of the 10 subjects, suggesting the proposed CorrCA was effective.(2) The CorrCA-RFG outperformed CCA-RFG (lower NRMSE values and higher CC values) on 9 out of the 10 subjects, suggesting that the CorrCA could be more effective than CCA.(3) The NRMSE values of CorrCA-RFG were lower than the CNN on 7 out of the CNN.And the CC values of CorrCA-RFG were higher than CNN on 8 out of the 10 subjects, suggesting that the CorrCA-RFG could be more effective than CNN.
Figure 3 showed the distribution of the evaluation criteria of the knee angles estimation across all subjects (N = 10) for these four methods.Outliers of NRMSE or CC existed in RFG, CorrCA-RFG, and CNN rather than in CorrCA-RFG, suggesting that the estimation of the CorrCA-RFG were more stable.The Kruskal-Wallis test was also used to determine if differences between the proposed CorrCA-RFG and the other three methods.The NRMSE values of CorrCA-RFG significantly decreased compared to these of CCA-RFG and CNN (p = 0.000 and p = 0.039).The CC values of CorrCA-RFG The case with the highest estimation performance for each subject is shown in bold.significantly increased compared to these of RFG, CCA-RFG, and CNN (p = 0.002, p = 0.000, and p = 0.039).All the above results showed that the proposed CorrCA-RFG outperformed the other three methods on the public dataset.

Estimation results on experimental study
We also tested the CorrCA-RFG on the experimental study and compared its performance with the other three methods.Knee angles estimated by CorrCA-RFG, RFG, CCA-RFG, and CNN is illustrated in Figure 4.The estimation profile by CorrCA-RFG is closer to reference angles calculated from IMUs.The CCA-RFG has the least smooth angles estimation profile, with sudden deviations and spikes.For each subject, the average NRMSE values and CC values were presented in Table 2, which showed that: ( Figure 5 displayed the distribution of the evaluation criteria of the estimation of knee angles across all subjects of experimental study for the four methods.The NRMSE and CC values of CorrCA-RFG distributed more centrally and closely to 17% and 80%, respectively, suggesting that the estimation of CorrCA-RFG were more robust across all subjects.The NRMSE and CC values of CCA-RFG distributed widely, indicating that the CCA-RFG was unstable.Outliers appeared on NRMSE values (RFG and CNN) and CC values (CNN) was associated with the unstableness of the methods.The Kruskal-Wallis test was also conducted to identify differences in NRMSE and CC between the proposed CorrCA-RFG and the other three methods in the experimental study.The NRMSE values of CorrCA-RFG had statistically significant differences with respect to RFG (p = 0.000), CCA-RFG (p = 0.000), and CNN (p = 0.000).The CC values of CorrCA-RFG significantly increased compared to the other three methods (RFG, p = 0.000; CCA-RFG, p = 0.000; CNN, p = 0.000).All the above results indicated that the proposed CorrCA-RFG also outperformed the other three methods in the experimental study.
It is worth noting that for a small number of subjects (both in the public dataset and the experimental study), the CorrCA did not improve the estimation performance.Some possible reasons are discussed in the next Section.

Discussion
In this paper, we have provided a new strategy from the viewpoint of extracting the user-independent sEMG features, namely CorrCA-RFG, for the  29,34,35 there are no studies of the effectiveness of the CorrCA on sEMG signals.The CorrCA can transfer the features from multiple subjects into the same space with any expert sets.To validate the effectiveness of the proposed CorrCA-RFG, it was tested both on a public dataset and an experimental study.We evaluated the CorrCA-RFG in terms of two criteria, namely NRMSE and CC, and compared it to other three methods: RFG, CCA-RFG and CNN.The estimation results presented in Figures 3 to 5 and Tables 1 and 2 indicated that the proposed CorrCA-RFG could improve the estimation performance compared to RFG, CCA-RFG or CNN.In addition, the distributions of NRMSE and CC values of the CorrCA-RFG were more centrally than that of the other three methods, suggesting that results of the CorrCA-RFG were more robust.
To qualitatively compare the sEMG features before and after CorrCA, a nonlinear dimensionality reduction technique called tÀdistributed stochastic neighbor embedding (tÀSNE) 36 was used.As mentioned in the last Section, trials from one subject and combined trials from the remaining subjects were used as the test set and train set, respectively.The tÀSNE visualizations of subject 5 in the public dataset and subject 1 in our experimental study were shown in Figure 6, where the red and cyan dots represent the testing and training set, respectively.In Figure 6, the left presents the sEMG features before CorrCA and the right after CorrCA.It can be observed that the features (both training features and testing features) after  The NRMSE and CC values of CCA-RFG distributed more widely than these of CorrCA-RFG and the RFG (shown in Figures 3 and 5), indicating that the estimation performance of CCA-RFG was worse than that of CorrCA-RFG, even the RFG.The main reason may be that the CorrCA yields a single set of linear projections from multiple subjects to transfer their sEMG features into the same space, whereas the CCA computed a different set of projection vectors for each subject to transfer the sEMG features into different spaces.Features from different spaces possibly share different distributions across different subjects.In addition, due to the limitation of CCA to find the projection vector only between two data matrices, an expert set should be selected to yield the projection vectors for multiple subjects.However, how to choose the expert set has not been well defined and is usually given based on experience.
The comparison between the CorrCA-RFG and CNN showed that the estimation performance of CNN is unstable.High uncertainties possibly exist in the training process of CNN, which leads to the learned features not always sharing a similar distribution.
This paper has proposed a CorrCA-RFG for the generalization of sEMG-based motion estimation over cross subjects, which has two desirable beneficial properties: (1) it extracts user-independent sEMG features among multiple subjects directly without any expert sets; and (2) it does not need any information from the new subjects.Knee angles estimation on the public dataset and the own experimental study verified the effectiveness of the CorrCA-RFG.However, the CorrCA-RFG still has some limitations.It actually degrade the estimation accuracy for a small number of subjects from the public dataset or the experimental study.It is possible that motiondependent components in sEMG signals may become more complex with some specific subjects and can not be adequately extracted by CorrCA for these subjects.For these specific subjects, the projection vectors obtained from different subjects may even cause the concept shift among transformed features.This could be one of the reasons why the proposed CorrCA-RFG could improve the estimation performance on most but not all subjects in the public dataset and the experimental study.Another possible reason may be that some outliers were in these subjects. 37The CorrCA-RFG could not deal with these outliers.

Conclusion
This study focused on improving the accuracy of human joint angle estimation for more robust practical applications with respect to user-dependent influence.For this purpose, it provides a new strategy, namely CorrCA-RFG, from the viewpoint of extracting the user-independent sEMG features without any expert sets.The proposed strategy was respectively evaluated with a public dataset (containing 10 healthy subjects) and an experimental study (containing 12 healthy subjects).The results both of the public dataset and the experimental study show that the proposed CorrCA-RFG outperforms the other three compared methods, including RFG, CCA-RFG, and CNN.These results indicate the proposed estimation strategy has significant potential benefits over the current estimation models for joint angle estimation with robotic devices in practice.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
36 cm) were recruited to perform walking experiments on a treadmill.All of them knew the process and signed the informed consent before experiments.The experiments were proved by the local ethics committee of Nankai University.Each subject was asked to walk for 1 min at the speed of 1.25 m/s per trial.Totally 11 trials were performed on each subject while leaving 3 min rest at the interval to avoid muscle fatigue.During the experiments, knee angle signals and sEMG signals were recorded by the device shown in Figure 2. The experimental study measured knee angle signals and sEMG signals of 12 subjects walking on a treadmill by the experiment device shown in Figure 2. Knee angle signals were recorded with 2 inertial measurement units (IMUs) with a data acquisition frequency of 100 Hz.Simultaneously, sEMG signals from 6 muscles, motioned above, were collected by the Delsys Bagnoli system with a sampling rate of 5 kHz.

Figure 2 .
Figure 2. Experimental setup: (a) schematic diagram of the experimental setup and (b) photograph of the experiment.

Figure 3 .
Figure 3. Distribution of the evaluation criteria of the knee angles estimation across all subjects (N = 10, on public dataset) for the CorrCA-RFG, RFG, CCA-RFG, and CNN.(a) The distribution of NRMSE values for each method.(b) The distribution of CC values for each method.The box plots show the distribution of quartiles and outliers.Asterisks indicate statistical significance based on the Kruskal-Wallis test.p-value: *, \ 0.05.

4 .
Normalized knee angles of subject-11 on the experimental study.Dashed and solid lines are reference and estimated angles, averaged across 183 strides during the subject walking.

Figure 5 .
Figure 5. Distribution of the evaluation criteria of the knee angles estimation across all subjects (N = 12, on experimental study) for the CorrCA-RFG, RFG, CCA-RFG, and CNN.(a) The distribution of NRMSE values for each method.(b) The distribution of CC values for each method.The box plots show the distribution of quartiles and outliers.Asterisks indicate statistical significance based on the Kruskal-Wallis test.p-value: *, \ 0.05.

Figure 6 .
Figure 6.tÀSNE visualization of subject 5 from the public dataset and subject 1 from the experimental study before and after CorrCA: (a) the public dataset and (b) the experimental study.Red dots: testing set; cyan dots: training set.
this paper, a public subject dataset and an experimental study were applied to validate and analyze the proposed CorrCA-RFG strategy.
28blic subject dataset.In the public datasets, gait data of 10 healthy subjects (all males) running on a treadmill at speeds of 2 m/s contains marker trajectories, ground reaction forces and moments, and sEMG signals.Totally 54 reflective markers were placed on each subject, whose positions were collected with 100 Hz using 8 Vicon MX40 + cameras.A Delsys Bagnoli System was used to measure the sEMG signals from 10 muscles: soleus, lateral gastrocnemius, medial gastrocnemius, tibialis anterior, biceps femoris long head, vastus medialis, vastus lateralis, rectus femoris, gluteus maximus, and gluteus medius.More details about the public dataset were provided in Hamner and Delp.28In this study, to be consistent with our experimental study, only six-sEMG signals mainly related to knee flexion/ extension were used, including biceps femoris (BF), lateral gastrocnemius (LG), medial gastrocnemius (MG), semitendinosus (ST), vastus lateral (VL), and vastus medial (VM).

Table 1 .
Evaluation criteria of the accuracy of knee angles estimated by CorrCA-RFG, RFG, CCA-RFG, and CNN for each subject on the public dataset.

Table 2 .
Evaluation criteria of the accuracy of knee angles estimated by CorrCA-RFG, RFG, CCA-RFG, and CNN for each subject on the experimental study.