ZHANG An Efficient Bilinear Factorization based Method For Motion Capture Data Refinement

As a preprocessing step, motion capture (mocap) data refinement is to predict missing data and remove noises and outliers. In recent years, low-rank matrix completion has been successfully applied to mocap dara refinement by considering the low-rank property of motion data, wherein a representative approach is called TSMC proposed by Feng et al. However, this approach heavily depends on singular value decomposition (SVD) and requires to calculate the inversion of a smoothing related matrix at each iteration whose size is equal to the frame number of motion sequence. Thus, it is very slow for long motion sequences. In this paper, we aim to present an efficient method, in which the matrix bilinear factorization and the variational definition of matrix nuclear norm are employed to avoid SVD. Besides, after analyzing the eigendecomposition of the smoothing related matrix, we convert matrix inversion into discrete cosine transform (DCT) and inverse DCT. The augmented Lagrange multiplier algorithm is adopted to solve the refinement optimization model. Experimental results show the proposed approach is much more accurate and efficient than TSMC.


Introduction
Optical motion capture (mocap) is a prevalent technology used to record the 3d position and orientation information of a moving subject from multiple cameras.Such information has various applications.For example, in the entertainment industry, virtual characters can be animated by using the motion data captured from actors; In sports, athletes can boost their training performance by analyzing the mocap data from their motion [1,2].However, even with the most expensive commercial mocap systems, e.g.Vicon and Motion Analysis, there still exist instances where noises and missing data are inevitable.To make sure high-quality performance in each application, the corrupted mocap data should be refined.Therefore, it becomes an important research branch of motion processing to handle the following two sub-problems: to predict the missing values and to remove both the noises and outliers.These two sub-problems are collectively referred to as mocap data refinement [3].
Mocap data refinement is a challenging problem because human motion consists of highly coordinated movements.During processing, well exploiting the structural relationship among different human joints and the spatio-temporal patterns embedded in human motion would bring better refining results [3][4][5][6][7][8][9].Due to the high articulation and correlation of human motion, when representing a human motion sequence as a matrix, it will be an approximately low-rank matrix.The low-rank property thus was used as a prior, and then the mocap data refinement problem was correspondingly formulated as the low-rank matrix completion in [7].Based on that work, Feng et al. [3] proposed a noise robust and temporal stable matrix completion (TSMC) model which takes the temporal stability and low-rank structure properties of motion data into account, and consequently performs better than the method [7] and three other commonly used methods, i.e. linear interpolation, spline interpolation and Dynammo [5].While the work [3] is effective, it is slow, especially for long motion sequences, because its implementation is based on singular value thresholding (SVT) [8] which requires the computation of a full or partial SVD at each iteration that becomes increasingly costly as matrix dimensions grow.
This paper addresses the computational efficiency of the TMSC algorithm.Inspired by the bilinear factorization [9], we use the variational definition of nuclear norm that is defined as the average of two squared Frobenius norms, instead of the sum of singular values.Besides, as the dimension of the smoothing related matrix is equal to the frame number of mocap data that is very large for long motion sequences, it's too expensive to directly calculate its inverse matrix at each iteration.We so take into consideration the transposition problem of TSMC and then simplify the inverse calculation of the smooth matrix into the discrete cosine transform (DCT) and inverse DCT (IDCT).Compared to TSMC, the proposed approach is able to not only produce comparable and even better refining results, but also perform much faster (about 7 times faster than TSMC for a motion sequence consisting of about 5000 frames).
The remainder of this paper is organized as follows.Section II briefly reviews the TSMC algorithm.Section III introduces the proposed approach in details.Section IV gives various experimental results.Finally, we conclude this paper in Section V.

The TSMC algorithm
In this paper, we denote a motion sequence consisting of n frames as a matrix , , , , where J is the number of all joints in a human skeleton, and x ∈ � represents a frame.Mocap data are captured as a sequence of frames, but some frames contain missing data and some contains noises and outliers.Unauthenticated Download Date | 6/23/19 1:25 AM Before introducing our approach, we briefly describe the TSMC algorithm in [3].Assume that the noise is sparse in the observed part, the TSMC refines the corrupted mocap data via the following model: where C continuity on every marker's trajectory: Unfortunately, Eq. ( 1) is NP-hard due to the discontinuous and nonconvex nature of the rank function and 0 l norm.Therefore, a widely used strategy is to replace the rank function and 0 l norm with the nuclear norm (the sum of all the singular values of a matrix) and 1 l norm (the sum of the absolute values of all the entries), respectively [12].More specifically, it yields the following optimization model: Then, the augmented Lagrange multipliers (ALM) method [13] was employed to solve (3), which is partitioned into several sub-problems and solves them alternatively.These sub-problems are based on the thresholdings: is the soft shrinkage operator.In summary, the TSMC method is described in Algorithm 1 (for details see [3]).

The proposed scheme
As shown in Algorithm 1, Step 4 requires computing the SVD of a matrix of size 3J n × .It is evident that computing a full SVD at every iteration is too costly to be practical for solving truly large-scale problems [14][15].Even for the partial SVD strategy which only computes a proper subset of dominant singular pairs (values and vectors) instead of the full set, its computation cost can still be quite high on a wide range of large matrices.Meanwhile, Step 8 requires calculating the inverse of a matrix of size n n × .It's still too expensive to calculate it when n is large.Therefore, it's desirable to exploit an alternative approach that avoids SVD computation and inversion calculation, by replacing them with some less expensive ones.The target of this paper is to investigate such a non-SVD approach meanwhile without calculating the inversion of any big matrix in order to more efficiently solve the mocap data refinement problem.

Bilinear factorization model
For a low-rank matrix ( ) . First, to avoid computing SVD like Step 4 in Algorithm 1, we adopt the variational definition of nuclear norm, that is ( ) where Note that the following norm properties are used in (6): . where

Optimizaiton
To solve (7), we present an inexact augmented Lagrange multiplier method, called IALM [13].We first consider an equivalent form of (7): ( ) Then, the augmented Lagrangian function of ( 8) is: , , 2  According to IALM, we solve (9) by minimizing each variable alternatively while fixing the other variables, so that the optimization problem can be divided into four sub-problems and two multipliers updating: More specially, to solve the variable U , we fix the other variables and solve the following least-square problem: : . Similarly, we update the variable V as follows: For fixed , , U V M , we get the following optimization problem for finding E : where ( ) ( ) : Finally, we update M as follows: Note that in (13), we require to calculate the inversion of matrix

W W − =
) containing the eigenvectors.Also it is worth noting that T W and W are actually the n-by-n type-2 DCT and IDCT matrices.Therefore, we have ( ) .

Rank estimation
Since a proper estimation to the rank of mocap data is essential for the bilinear factorization in (5), we make use of the concept of last significant jump (LSJ) rule proposed in [17][18] to adaptively estimate the rank.Indeed, the basic principle of LSJ is to detect the support of a sparse vector consisting of eigenvalues whose cardinality is the rank of a low-rank matrix.
Due to that the given mocap data X is incomplete, we first given an initial guess for the missing markers by apply linear interpolation method and thus obtain X  .Next, we apply the LSJ rule based on thresholding to estimate the rank of X  .In details, let ( ) The LSJ rule is to look for the largest index 0 Finally, considering that X also contains noises and outliers, the rank estimated above may be a little conservative, and thus we specify the parameter d satisfying d r ≥ as ( )

Algorithm
Based on the previous analysis, we can derive a bilinear factorization based method for refining mocap data which is summarized in Algorithm 2.
We remark that the main complexity of Steps 3 and 4 in Algorithm 2 is where d by ( 15) is very small ( ) ( ) . It is also much faster to take (14) to compute ( ) , because the DCT merely has a complexity of ( ) , whereas directly computing the inversion has a complexity of ( ) Update the multipliers: Update the penalty parameters: ( )

Experimental results
In this section, we compare the proposed method only with TSMC, since TSMC performs more effectively than many existing approaches, such as linear interpolation, spline interpolation, Dynammo and SVT.To evaluate the performance, we selected six motion sequences (walk, gymnastics, dance, acrobatics, basketball and boxing) from the CMU mocap dataset (http://mocap.cs.cmu.edu/),where the number of joints in a human skeleton is 31 J = .Like the work [3], we simulate four classical situations to synthesize four different kinds of corrupted data, which are briefly listed as follows: Randomly corrupt data (rdcrupt), Randomly lose data (rdlose), Mixed corrupt data (mxcrupt) and Regularly lose data (rglose).All the experiments were implemented in Matlab (version R2010a) on an Intel Core i5-4210U CPU@1.70GHzPC with 4GB memory.
The parameters in our method were set as follows for all experiments: .To verify the refining accuracy, we adopt the root mean squared error (RMSE) measurement: , where i x and i x  correspond to the imperfect and refined poses, respectively, and p n is the total number of imperfect entries, i.e. the missing and noise entries.
For the four cases (rdcrupt, rdlose, mxcrupt and rglose), the RMSEs of TSMC (in red color) and the proposed method (in blue color) were shown in Figure 1 to 4, where we can easily see that our approach outperforms TSMC for most of the refined results in each case.The reason lies in that our approach exploits the true low-rank property of mocap data by using bilinear factorization, while TSMC depends on partial SVD (pSVD) to solve model (3) which causes the information corresponding to the singular values under the pSVD specified threshold dropped.Besides, Table 1 lists the elapsed cpu time, in which we can find the elapsed time by our method is significantly reduced.As the frame number increases, the time gap between TSMC and our method grows rapidly.For example, our method is about 1 second faster than TSMC for "Walk" whose frame number is 343, but our method becomes about 7 times faster than TSMC for "Boxing" and "Basketball" whose frame numbers are around 5000.These statistical data demonstrate that the proposed bilinear factorization and DCT-based inverse matrix calculation indeed accelerate our method.

Conclusion
In this paper, we have presented an efficient mocap data refinement method based on the matrix bilinear factorization and (inverse) discrete cosine transform.Experimental results show the proposed method outperforms TSMC and also performs more and more fast as the frame number of motion sequence grows.However, from the results in Figure 4, we find both of the methods have a sharp decline in performance for the case of regularly and continuous lose data, which needs to be explored further.
bilinear factorization aims to find two smaller low- d is an upper bound on the rank of A , i.e.

1 . 2
the other steps are similar, Algorithm 2 has much less complexity than Algorithm Algorithm Algorithm for the proposed method Input: , , X O Ω , parameters max , , , , α β γ ε λ ; Output: Y and E ; 1: Initialize: Estimate the rank r and compute d ;

Figure 1 .
Figure 1.Comparisons of refined results for the case of randomly corrupt data (rdcrupt): Gaussian noises ( 2 σ = ) were randomly added on 30% data for each motion sequences, wherein x-label denotes the frame index and y-label the RMSE of each frame (cm/frame).

Figure 2 .
Figure 2. Comparisons of refined results for the case of randomly lose data (rdloss): 40% data of each motion sequence were randomly missing, wherein x-label denotes the frame index and y-label the RMSE of each frame (cm/frame).

Figure 3 .
Figure 3. Comparisons of refined results for the case of mixed corrupt data (mxcrupt): 30% data were randomly missing and then 30% of the remaining data were corrupted by Gaussian nose ( 2 σ = ),

Figure 4 .
Figure 4.Comparisons of refined results for the case of regularly lose data (rgloss): 30% data were randomly removed wherein the number of selected missing markers is fixed to be 10 and each missed 60 frames, wherein x-label denotes the frame index and y-label the RMSE of each frame (cm/frame).

Table 1 .
The elasped Cpu time comparison between TSMC and the proposed method (in second), where the number in the parentheses is frame number.