3D Reaching Movements Prediction of Upper-limb Joints Based on Deep Neural Networks

Background: The reaching test is widely adapted in motor function assessment of stroke rehabilitation. To evaluate the motor disorder quantitatively, it is important to measure the diﬀerences between reaching movements made by healthy people and patients. Thus a movement prediction model should be ﬁrstly established on healthy people as a customized benchmark. Methods: We designed a simpliﬁed kinematic model for human upper limbs in which seven main joints of both the dominant and non-dominant side were extracted. With this model, the reaching movement data was collected from a healthy participant. A deep neural network (DNN) was trained with this dataset. Then, the DNN was utilized for predicting 3D movements of upper limb joints of a healthy participant. Results: The prediction trajectories of dominant side were high similar to the trajectories of real movements with the coupling distance around 60 mm, 50 mm, 30 mm, 30 mm, 20mm for hand, elbow, shoulder, 7 th cervical vertebra and 8 th thoracic vertebra. The result of non-dominant side were less accurate than dominant side but still was with relatively short coupling distance. Conclusions: The DNN model could achieve the promising accuracy in 3D movements estimation of upper limb. With good capabilities of identifying speciﬁc reaching movements in dynamic processing, a customized benchmark established by data-driven methods could be utilized to inform the rehabilitation assessment and training in the future studies.


Background
Stoke is the leading cause of adult disability in the world [1]. With the increasing number of patients suffering stroke, we need a more reliable approach for their rehabilitation assessment and training [2]. Motor function assessment plays a significant role in the rehabilitation area as it directly illustrates the effectiveness of the rehabilitation therapy [3,4]. To provide a reliable assessment of motor function, the understanding of the motor control and the trajectories planning strategies of the central nervous system (CNS) need to be improved [5]. Reaching is one of the simplest movement of the daily life. It has been frequently used in the motor function assessment for patients with motor disorder [6,7,8,9]. Thus, a better understanding of normal reaching movement make the motor function assessment much better. More specifically, if we could know how normal people move during a given reaching task, we would be able to quantitatively measure the difference between the normal and patient's reaching.
Previous literature have verified that the CNS is not randomly controlling the limbs and planning the movement during the reaching [10,7,8]. Among the infinite number of trajectories which can be chosen due to the redundancy of the upper limb motor system [11,10,7,8], it is considered that the CNS tries to minimize the movement-related costs in the control of upper limbs [12]. Therefore, the movement trajectories of unconstrained reaching with natural speed would be repetitive in the same reaching task [13,14]. In other words, there is a optimal trajectory selected by the CNS when a person is doing a specific motor task. Although which strategies exactly employed are still largely unclear, it is possible to model the reaching process and predict the movement, which would prove useful in the rehabilitation assessment and training.
Earlier studies have reported many kinematic and dynamic models which aim to discover these control strategies behind the reaching movement. The vast majority of these models are developed to generate the prediction trajectory by optimizing a criterion function. Flash and Hogan proposed their minimum-jerk model which aims to optimize the rate of acceleration changes of hand [15]. Hasan, then, reported the model minimizing the effort of joints [16]. Todorov presented a new approach which aims to minimize the reaching error and energy cost together [17]. Pham reported the minimum acceleration model for goal-oriented locomotion which is adapted from the smoothness maximization models [18]. Dongsung and Terrencer presented a power law [19] and smoothness maximization model to generate the target trajectory [20]. DeWolf presented a spiking neuron model to simulate the planar arm movement [21]. Dounskaia proposed a cost function which represents the neural effort for joint coordination [22]. Although these models provide the information about the potential laws employed by the CNS, it is still hard to identify the complex motor system of the upper limb, leading to the lack of accurate prediction results to support the rehabilitation assessment.
Deep learning (DL) techniques have been widely applied to various fields due to their incredible capability in complex system identification. Many recent studies have employed the DL approaches to develop the models that could predict the trajectory of the upper limb reaching movement. Bernabucci presented a bioinspired artificial neural networks (ANN) model to prediction upper human arm planar movement [23]. Genc reported a new convolutional neural network (CNN) structure to overcome the scalability and robustness difficulties in complex system identification [24]. Gilra proposed a non-linear dynamics prediction model by using the recurrent spiking neural networks [25]. Tieck trained the liquid state machine (LSM) with reinforcement learning (RL) to learn the continuous muscle control of target reaching task [26]. Lang stated a multilayer Gaussian process (GP) model [27]. However, these models still focus on planar or 3D single joint movement prediction, while the customized motor function assessment of stroke requires the model to predict 3D movements of the multiple joints of the upper limb.
This paper aims to investigate a data-driven method for the prediction of 3D movements of multiple joints in the upper limbs during reaching. Firstly, we established a simplified kinematic model and extracted seven main joints of both the dominant and nondominant side of upper limbs. Based on this model, a deep neural network was established and trained with the movement data from one healthy participant. The experimental results indicated that the DNN model performed a good ability in prediction of upper limb movement. Its performance on the dominant side movement prediction was better than nondominant side. Additionally, for both of the dominant and non-dominant side, the prediction result of hand, elbow was better than shoulder, cervical spine and trunk.

Simplified Skeletons Structure
Based on the upper limb framework proposed by the International Society of Biomechanics (ISB) [28], a simplified kinematic model of human upper limb is defined. It contains seven segments which are the trunk, right/left upper arm, right/left forearm, right/left shoulder. These segments are connected by seven main joints of the upper limbs. Only the connection at the elbow and 8 th thoracic vertebra are one degree of freedom joints, all other connections are three degrees of freedom joints. Fig. 1a shows one side of the kinematic model used in this study, it contains the main joints on this side of the upper limb. With this kinematic model, an upper-limb marker model (a set of marker placements) is presented in Fig.1b. There are 19 reflective markers to identify the kinematics required by the movement prediction model.

Experiment Setup
This study was approved by the University Ethics Committee (Reference MEEC 18-005). The data collection experiment involved one 25 year-old right hand dominant male healthy participant without motor disorder. The participant was asked to complete 6 blocks tests for both of the dominant and non-dominant side, every block contains 20 free reaching tasks with natural speed, making a total 120 times repetitions for each side movement. The 19 reflective markers were placed on the skin to identify the segments and joints mentioned in the kinematic model. A Universal Robot (UR) was employed to provide the targets for the participant during the reaching tasks.
The participant sat on the chair with a seatbelt fastened to remain his start position and also restrain his lower limbs' movement. Firstly, the participant was asked to move his arm in the 3-D space freely but without trunk movement and shoulder displacement to measure the range of motion of his arms. Then, a set of target positions were determined based on the range of motion. Some of the targets' positions were set in the range of motion, and some were not. Fig. 2a shows the motion captured by the Vicon system. Fig.  2b shows what the participant was asked to do in the task.
At the beginning of each task, a new target position was given by UR, and a audio guider was employed to remind the participant when to start the task. After completing each task, the participant was given a short period to relax. During every task, the Vicon Motion Capture system (Vicon Motion Systems, Ltd., Oxford, England) with 8 Vero-cameras were used to record all the markers' trajectories, velocities and also acceleration along three-axes (x, y, z) with 250 Hz sampling rate. After each block, there was a longer period for participants to rest in case the muscle fatigue may impact the participant's performance.

Data Pre-processing
The whole reaching movement data has two stages, the forward reaching stage and the recovery stage. Due to the scope of this study, only the first stage data was extracted and used to train our data-driven model. Thus, firstly, the velocity profile of maker 16 (for dominant side) or maker 11 (for non-dominant side) during the reaching movement was calculated and shown in Fig.3a. Based on this velocity profile, the movement onset and offset instances, t on and t of f , of this stage were determined both by 5% maximum velocity [29,30], which is shown in Fig.3a with yellow color, a bell-shaped profile. An example of the extracted and whole dominant-hand reaching movement To apply the DNN model, the origin reaching data of each side were reconstructed as a specific form. From the extracted data samples, the target position and participant's initial pose, including C7, T8, and the test-side shoulder, elbow, and hand positions of all reaching movement were also identified as the input of the DNN model. For each data sample, the input is an 18×1 feature matrix (FM). Thus, the input FMs of all data samples can be clustered into one batch and reshaped as an 18 × 1 × N matrix as the input (X) of our model, where N is the total number of data samples. Moreover, 100 frames data were re-sampled from every forward reaching stage. Each of these frames contains the 3D coordinates of C7, T8, shoulder, elbow, and hand. Thus, the coordinates of each frames can be described as a 5 × 3 matrix. For the total 100 frames, a 5 × 300 coordinates matrix can be found. The coordinates matrix of each data sample was reshaped to be a 1500 × 1 matrix. As a result, we got a 1500 × 1 × N matrix as the expected outputs (Y ) of our model by clustering the coordinates matrices of all data samples, where N is the number of data samples we collected from each side reaching movement. Finally, for each side movement, the input matrix X and output matrix Y of this DNN model were determined.

DNN Architecture
The DNN model contains 1 input layer, 1 output layer, 3 fully-connected (FC) layers, and 1 dropout layer. The input layer has 18 units to read the initial FM. The first hidden layer is fully-connected with the input layer with 256 units and ReLU activation function. After this, a dropout layer is applied to avoid the over-fitting problem. The third layer is also an FC layer but with 750 units activated by the Sigmoid activation function. Then, the last FC layer has 750 units with ReLU. Finally, an output layer is fully-connected with the last FC with 1500 output units. Fig. 4 shows the architecture this model.

Model Training
The whole dataset was split into five folds, where four of them were used for model training and one was remained for testing. To avoid over-fitting, the four folds of samples were further split as the training set (60%) and the validation set (40%). The Mean Squared Error(MSE) was employed to tune the DNN's hyperparameters during the training process. The full batch learning strategy was applied to calculated the gradient more accurately. Then, based on such a strategy, the DNN is trained in 1000 epochs with the RMSprop optimizer whose learning rate was set as 0.001, and the dropout rate of the dropout layers is set as 0.2.

Model Evaluation
We evaluate the performance of the model by calculating the similarity between the prediction trajectory and the real trajectory of all joints. The similarity is measured by the Discrete Frechet distance (coupling distance) [31] which is defined as follow: where t s and t e is the start and end time instance, respectively, N, M are the end position of trajectory P and Q, P (α(t)) and Q(β(t)) are two different trajectories, α(t) and β(t) are the position description function of P, Q respectively. Additionally, α(t s ) = 0, α(t e ) = N, β(t s ) = 0, β(t e ) = M . For every joint, a smaller coupling distance between the prediction and expectation means a higher similarity.

Result and Discussion
The left and right part of Fig.5 show the prediction and expected trajectories of the dominant and nondominant sides, respectively. The prediction results and the real trajectories show a high similarity. For the dominant side, the prediction trajectories of hand and elbow are almost coincident with the real trajectories, see Fig.5a and Fig.5b. The generated trajectories of shoulder, C7 and T8 are not well coincident with the expected trajectories. Nevertheless, the shapes of these prediction trajectories show high similarity with the expected movement trajectories, see Fig.5c, Fig.5d and Fig.5e. This indicates that the DNN could predict the dominant side movement with promising high accuracy. For the non-dominant side movement, the prediction results are well coincident with the real trajectories, see Fig.5f and Fig.5g. In contrast, the generation results of shoulder, C7 and T8 are not very close to the expected trajectories but still show similar shapes, see Fig.5h, Fig.5i and Fig.5j. Thus, the prediction results of the non-dominant side are still highly similar to the real movements. These results are consistent with Fig.6a and Fig.6b. The coupling distances between the prediction movements and real movements were calculated for the testing samples, see Fig.6a and Fig.6b. The blue bars show the average coupling distance between the prediction and real trajectories in the testing dataset. The error bars show the standard deviation of these coupling distances. It is clear that the predictions on the dominant side are stable than the non-dominant side. The error rates of predictions of hand and elbow are lower than that of shoulder, C7 and T8 for both sides.
The average coupling distance between these predicted trajectories and expected trajectories of testing samples are calculated and be compared with the average coupling distance between each pair of the trajectories on the training dataset, see Table.1 and Table.2. For the dominant side, the average coupling distance of hand and elbow trajectories on test dataset is about 60 mm and 50 mm, respectively, which is much lower than that on training dataset. In contrast, the coupling distance of shoulder, C7 and T8 are much closer to the average coupling distance on the training dataset. It indicates that the prediction of hand and elbow are actually better than the prediction of shoulder, C7 and T8 even the average coupling distance of the latter are slightly lower than the former. However, the average coupling distances of shoulder, C7 and T8 are still much lower than that on the training dataset. It illustrates that the prediction results of dominant side movements have high similarity with the real movement trajectories. For the non-dominant side, the significant differences between the average coupling distance on the testing and training dataset can be found for hand and elbow. Thus, the prediction trajectories of hand and elbow are very close to the real trajectories. That average coupling distance of shoulder is much closer to the average coupling distance on training dataset but still lower. For C7 and T8, these values are slightly higher than the average coupling distance on the training dataset, thus the prediction of these two joints still should be improved.
For the non-dominant side, the coupling distances between the prediction and expectation are much higher than the dominant side. It means that the performance of the DNN model for the dominant side is better than the non-dominant side. The reason might be that the target settings of dominant side experiment are much more reasonable. As a result, the data samples collected for the dominant side could indicate the whole range of motion of the participant's upper limbs. As shown in Table.1 and Table.2, the average coupling distances for the C7, T8 of the dominant side are much higher than non-dominant side, indicating the diversity of dominant movement dataset is higher than the non-dominant side. In a word, the DNN shows   The presented model provides a good solution for 3D movements prediction of upper-limb joints with promising accuracy. As far as we are ware, such a model of trajectory prediction for upper-limb multiple joints' 3D reaching movements has not been described before. It inspires a novel approach of motor function assessment based on which the customized assessment technique for the stroke patients could be developed. To achieve this, there would be more healthy subjects be involved to improve the generalization ability of this model, so that it could predict different trajectories of reaching movements of participants with different personalities.

CONCLUSIONS AND FUTURE WORK
The DNN model shows a great ability in identifying specific dynamic processing. The generated trajectories are very close to the real trajectories. For the dominant side, the trajectories generated by the DNN are well coincident with the corresponding real trajectories, especially for the hand and elbow, where the prediction trajectories and expected trajectories are almost the same. The prediction results of shoulder, neck and trunk also show high similarity with the real movements. For the non-dominant side, the movement of hand and elbow generated by the DNN can also be highly representative of the movement made by the participant. This includes the movement of shoulder, neck, and trunk. In summary, the presented DNN model gives a good prediction of 3D reaching movements for multiple joints of human upper limbs. In the future research, more healthy participants will be involved to address the generalization issue. This prediction model will also be applied to stroke survivors to estimate the deviation from the normal movement pattern and inform the rehabilitation assessment and training.