Motion Capture Technique with Enhancement Filters for Humanoid Model Movement Animation

,


INTRODUCTION
The definition of 3D animation is a representation of objects that are made into animation using characters or objects to look more alive and real.Making 3D animation itself requires long processing and a large amount of funding.This is because most 3D animated films still use keyframing technology which causes the process to make an animation to take a lot of steps [1].The cost of making 3D animation in a studio with the full support of animation tools costs around 3000 Euros per month depending on the duration of the tool usage and data collection.An example of technological development in the field of 3D animation is the motion capture technique.Motion capture is a process of recording a movement of an object and then implementing it in digital form or 3D objects.This technique is usually used to make real/animated films or games.The motion capture technique can effectively cut production costs and increase the efficiency of 3D animation production.The result of 3D animation made using the motion capture method is not always perfect.Usually, there is shakiness in the legs or arms of the object [2].Therefore, an enhancement method is needed to optimize the final result of 3D animation, namely by using the animation layer and filtering.The animation layer is very useful to enhance the result of the motion capture method because it only edits certain points in the animation that require changes so that the other data/objects are not changed.Filtering can help to prevent movements that penetrate the object's body and overcomes the shakiness in the object's legs or arms.
Generating animation using the motion capture method has been proposed by many researchers [1]- [9].From the previous related research, the motion capture method of the humanoid model for sports animation movements using markerless sensors did not always produce good results.Therefore extra effort is needed to get a good animation result [8].Another related research is the use of silhouette-based motion capture without markers.The system component contains silhouette extraction based on the level set which connects the data image with the data model and poses the estimation module.Experiments were carried out in different camera settings and estimated the model components with 21 degrees of freedom at up to two frames per second.To handle the stability problem, a marker-based motion estimation system is performed.The method is applied for the analysis of markerless sports movements [4].In the reference entitled applicability of a single depth sensor in real-time 3D clothes simulation: augmented reality virtual dressing room using Kinect sensor, Kinect sensor is used to create a virtual dressing room that is useful for assisting users in making online purchases, especially on clothes.The system can simulate clothes in a 3D form in real-time [10].The last reference discusses the use of motion capture technology with multiple Kinect controllers namely Kinect 360 and Kinect One controllers.Multiple Kinect controllers are used in parallel to produce a good motion capture result.The experiment results also compare the use of single and multiple Kinect to implement the motion capture systems [9].In this research, the proposed motion capture technique is enhanced with some filters to make animation for the humanoid model.The paper is presented as follows: section 2 explains the proposed motion capture method, section 3 presents the result and discussion and finally, section 4 presents the conclusion of this work.

THE PROPOSED MOTION CAPTURE METHOD
The proposed motion capture method consists of several steps namely reading the movement of skeleton joints using a Kinect sensor, performing filtering to enhance the movement, mapping the skeleton joints to humanoid bones, and finally recording the animation.The flowchart of the proposed method is shown in Fig. 1.The details of each step will be explained in the following sub-sections.

Enhancement Filters
The filtering step is used to optimize the movements of the tracked skeleton joints by the sensor.The filters will limit the joint movements to minimize the shakiness or jitter produced by the sensor or sudden fast movements by the user.The filters are adopted from the MS-SDK library [11] namely: a) Smoothing filter.The smoothing filter is performed by detecting the difference between the coordinates of the current joint and the previous joint.If the difference is too large then it is likely shakiness because the movement of each joint of each frame should be sequential with a small difference in position.b) Bone orientation filter.A bone orientation filter is applied so that the joint does not rotate too far and follows the orientation direction of the humanoid model.c) Movement speed filter.This filter limits the movement and provides motion tolerance so that the animation sequence can be maintained properly.

Mapping Skeleton Joints to Humanoid Bones
The mapping step is performed to map Kinect skeleton joints data into humanoid bones data.This mapping process will produce a motion animation of the humanoid 3D model [1].There are 25 joints in the Kinect data while there are 52 joints in the humanoid bones model.Therefore, not all joints in the humanoid bones model can be used, such as the finger bones (except for the index and thumb) and the toes.The process of mapping 25 skeleton joints from the Kinect sensor to the bones in the humanoid model (configuration standard from Unity) and the respective parent bones are shown in Table 1.Based on Table 1, the head and chest bones were not used in this research.because the standard Kinect joints did not track these bones' movement and they only applied when face detection is turned on to track head movements.After mapping, the movement of each humanoid bones model will be updated, that is, translation and rotation are carried out based on the movement of the skeleton joints from the Kinect sensor.
The translation is carried out by changes respective to the root, namely SpineBase by updating the initial position added with the offset of the change in position.Quaternion-based rotation is performed by multiplying the initial rotation of each bone of the humanoid model by the rotation of the Kinect skeleton joint.The rotation of the Kinect skeleton joints is calculated based on the orientation of the current joint with the next connected joint.After getting the rotation value, animation can be performed.Table 1.Mapping skeleton joints from Kinect to humanoid bones The results of the mapping skeleton joints to the humanoid bones are shown in Fig. 3 where (a) is the skeleton joints' position and orientation, (b) is the humanoid model movement, and (c) is the user movement.Based on Fig. 3, the humanoid model correctly follows the movement of the user.
Fig. 3.The example of mapping skeleton joints to the humanoid bones.

Animation Recording
The recording step is the final step of the animation-making process using the motion capture technique.At this step, every movement that is generated from the motion capture method will be recorded and then rendered to produce an animation.The animation is stored in .animfile format.The recording step will record each bone or joint that moves each frame into the keyframe.The transformation data such as transform, orientation, and scale will be recorded at this step.

Performance measurements
The performance is measured using MRI (Mean Rating Index) and MSE (Mean Square Error) metrics.MRI is the average rating index (RI) that can be used to measure the success of research involving the opinions of the respondents.MSE is used to measure the performance between the observation (RIobs) and the expectation rating index (RIexp) .The lower the MSE score means the better performance of the method.If  i is the Likert scale of weight i, S i is the number of the respondent at each weight, i is the weight where i = 1,2,3,4,5 ,  is the number of the respondent, N is the number of tests and RIj is the rating index at test-j then the formulas to compute RI, MRI, and MSE are shown in ( 1), (2), and (3) respectively [12].

RESULT AND DISCUSSIONS
The proposed motion capture method is implemented using humanoid models from the Mixamo platform [13], C# programming language, Unity [14], Kinect library [15], and MS-SDK library [11].The test was conducted on a Laptop with an Intel i7 processor, NVidia GTX 1050, and 16GB of RAM.The result of each step to produce an animated motion using the motion capture technique on a humanoid model is described as follows:

Authors and Affiliations
The template is designed so that author affiliations are not repeated each time for multiple authors of the same affiliation.Please keep your affiliations as succinct as possible (for example, do not differentiate among departments of the same organization).This template was designed for two affiliations.

The Result of Mapping Skeleton Joints to Humanoid Bones
Input data from the Kinect sensor such as skeleton joint positions and orientations are mapped into humanoid bones using the standard configuration of the Unity humanoid avatar.The mapping result is shown in Fig. 4 where (a) is the skeleton joints' position in X, Y, and Z coordinates and rotations quaternion and (b) is their respective humanoid bones' positions and orientations.Based on Fig. 4, 25 Kinect skeleton joints data are mapped into 25 bones of a 3D humanoid model to represent a motion respective to the user movement.The mapping result in Fig. 4 is already enhanced with enhancement filters such as smoothing filter, bone orientation filter, and movement speed filter.These filters are applied because the skeleton joints data are sensitive to the sensor accuracy so shakiness and jitter effects are often found in the hand or leg joints.The filters can minimize the effects and optimize the movements before they are mapped to humanoid bones.The result of enhancement filters is shown in Fig. 5 where (a) is the movement before applying the enhancement filters and (b) is the movement after applying the enhancement filters.Based on Fig. 5, the filters produce fairly accurate movement results.Based on Fig. 6, the mapping result from skeleton joints into humanoid bones is fairly accurate and the animation sequence in each frame also shows no shakiness and jitter.However, the model is not yet capable to perform rotating motions where the user turns his back from the camera because, in this research, the head position is not tracked.

Performance Evaluation
The performance is evaluated using MRI and MSE scores obtained from the questionnaire that was distributed to 25 respondents.The questionnaire compares 10 modern dance movements performed by the user and the 3D humanoid model and the respondent requires to rate the similarity of the movement.The result achieves an MRI score of 4.276 and an MSE score of 0.053936.Based on the MSE results, the proposed method is considered acceptable because the MSE score is less than 5% [12].The test results can be seen in Table 2.

Fig. 1 .
Fig. 1.Flowchart of the proposed motion capture method.Skeletal Tracking Using Kinect Skeleton joints are tracked using a Kinect sensor in the form of 3D coordinate positions and orientations from the joint movements of the user which are detected by the sensor.The data obtained from the sensor will be mapped into humanoid body bones data so that it can be implemented into a 3D humanoid model.The configuration of the skeleton joints of the Kinect sensor and humanoid body bones of Unity can be seen in Fig. 2. Based on Fig. 2, (a) is the skeleton joints tracked by the sensor, (b) is the configuration index of the skeleton joints, (c) is the rigged humanoid model, and (d) is the humanoid bones configuration in Unity.

2 .
(a) Tracked skeleton joints (b) Configuration of skeleton joints (c) Rigged humanoid (d) Configuration of humanoid bo Fig. Configuration of skeleton joints from Kinect and humanoid bones from Unity.

Fig. 4 .
Fig. 4. The result of mapping skeleton joints to humanoid bones.The mapping result in Fig.4is already enhanced with enhancement filters such as smoothing filter, bone orientation filter, and movement speed filter.These filters are applied because the skeleton joints data are sensitive to the sensor accuracy so shakiness and jitter effects are often found in the hand or leg joints.The filters can minimize the effects and optimize the movements before they are mapped to humanoid bones.The result of enhancement filters is shown in Fig.5where (a) is the movement before applying the enhancement filters and (b) is the movement after applying the enhancement filters.Based on Fig.5, the filters produce fairly accurate movement results.

Fig. 5 .Fig. 6 .
Fig. 5.The result of applying enhancement filters.An example of an animation result generated by the motion capture technique is shown in Fig. 6.

Table 2 .
The MRI and MSE score.CONCLUSIONSA motion capture technique with enhancement filters for humanoid model movement animation using a Kinect sensor has been proposed and tested.The test was conducted on 10 modern dance animations/movements that have been made in only approximately 3 hours (without editing process and correcting erroneous gestures).This shows the efficiency in production time.The result of applying enhancement filters is shown in the accurate movements generated by the method.The proposed method achieves an MRI score of 4.27 and an MSE score of 0.053936 which means acceptable (MSE ≤ 0.05).