Rehabilitation recognition skeleton data depth learning based on RNN

. With the extensive application of deep learning in the field of human rehabilitation, skeleton based rehabilitation recognition is becoming more and more concerned with large-scale bone data sets. The key factor of this task is the two intra frame representations of the combined co-and the inter-frame. In this paper, an inter frame representation method based on RNN is proposed. Pointtion of each joint is joint-coded they are assembled into semantic both spatial and temporal domains.we introduce a global spatial aggregation which is able to learn superior joint co features over local aggregation.


Introduction
Human rehabilitation recognition is the basic scenario of AI application. The description of human actions to joints is the best description of human rehabilitation. Firstly, compared with other optical flow and RGB, the data amount is very small, and the target is clear. Secondly, the joint data has the best robustness to the interference of the background noise. These two characteristics make skeleton based models have the inherent advantages of lightweight and high recognition rate.
In this paper, we focus on the problem of Workflow for Rehabilitation Recognition Skeleton ( Figure 1).

Related Work
The RNN network is designed to address the entire process simulation of exercise rehabilitation. The RNN network is a natural choice that has been used on a large scale to complete deep learning from the skeleton sequence. With the widespread use of deep learning, more and more literature uses RNN to learn skeleton features and complete largescale citations in various scenarios.

Co-occurrence Feature Learning with RNN
RNN is one of the most powerful and successful neural network models, which has been widely used in image classification, object detection, video classification and so on. Compared with RNN and other sequence structures, RNN can utilize historical information, take action sequence into account, and use time and space to encode asynchronously. Using the convolution characteristics of RNN, the human body's rehabilitation action is decomposed into two steps .that is, the action characteristics of the cross space domain, including the motion position and the range of motion, and the motion feature fitting in the Ag and the whole motion process. Finally, the results obtained from soft max can give the accuracy of the rehabilitation result to the posture and range of movement. It indicates that T is a D1 D2 D3 3D tensor flow. In the convolution process, the array values of motion posture and range of motion can be transposed to meet different rehabilitation needs. All tensors of dimension DI can be aggregated in whole process. If the key actions are specified in the action process, the two preceding and subsequent actions of the key actions can be used as key convolution frames.

Explicit Skeleton Motion
In addition to the co-occurrence of movement, a group of multiple time movement of joints is the key frame of rehabilitation action. Therefore, in human rehabilitation, skeleton action is taken as the key frame and the convolution network of RNN is introduced.
We formulate it as Kt ={D1t; D2t; : : : ; DNt}, where N is the number of joint and D = (x; y; z) is a 3D joint coordinate. The skeleton motion is defined as the temporal difference of each joint between two consecutive frames: Zt = Kt+1-Kt ={D1t+1-D1t; D2t+1_D2t; :::; DNt+1-DNt }: The original skeleton coordinate K and skeleton motion D are sent into convolution respectively by spatial coordinates and time coordinates. In the subsequent calculation, the two are combined to merge.

Hierarchical Co-occurrence Network
In this section, the detailed Rehabilitation recognition base on RNN will be described. The network architecture is shown in Figure 3. The joint tensor sequence X can be represented by Z G W tensor, Z represents the number of frames, G represents the number of joints in the skeleton, W represents coordinates. Skeletal movement is involved in network feedback in the same way as X. They are as the two input traffic. The two networks interact in the same system and their parameters. Their characteristics are data fusion after unified convolution. After a given state and motion requirement of a given motion skeleton, hierarchical learning is performed. In the first calculation, the point level features are computed by convolution with 1 (1) and 1 (2). The size of the body is fixed, so the independent three-dimensional coordinate system of each joint is fixed, and the feature of point level is added to this joint coordinate system. After that, we use RNN to transform the 3D coordinates of the joints into the moving channels. In the second calculation, all the global features will be classified by two fully linked layers of recovery action and rehabilitation time.
In the three converged calculations, the best accuracy occurs at the most operational time. The results of the test are as shown in Table 1.

Experiments
The NTU RGB+D data set is so far the largest skeleton-based human action recognition dataset. It contains 56880 skeleton sequences, which are annotated as one of 60 action classes.
There are two recommended evaluation protocols, i.e. Cross Subject (CS) and Cross-View (CV).In the training process, according to the international standard method, the random subsequence is extracted, and the extraction rate is uniformly distributed in [0.5, 1]. In the calculation process, the ratio of skeletal sequence is 0.9. The time of the sample is different. We normalize the sample and interpolate the sample with insufficient time. The total input training model of the algorithm is 100k, each sequence is 1024, the initial learning ability is 0.0001, the exponential decay is 0.999 per 10K step.

Conclusions
We present a deep learning framework for skeleton rehabilitation based on RNN for human rehabilitation. By using the convolution network of RNN, a three-dimensional coordinate