Gender-Specific Multi-Task Micro-Expression Recognition Using Pyramid CGBP-TOP Feature

: Micro-expression recognition has attracted growing research interests in the field of compute vision. However, micro-expression usually lasts a few seconds, thus it is difficult to detect. This paper presents a new framework to recognize micro-expression using pyramid histogram of Centralized Gabor Binary Pattern from Three Orthogonal Panels (CGBP-TOP) which is an extension of Local Gabor Binary Pattern from Three Orthogonal Panels feature. CGBP-TOP performs spatial and temporal analysis to capture the local facial characteristics of micro-expression image sequences. In order to keep more local information of the face, CGBP-TOP is extracted based on pyramid sub-regions of the micro-expression video frame. The combination of CGBP-TOP and spatial pyramid can represent well and truly the facial movements of the micro-expression image sequences. However, the dimension of our pyramid CGBP-TOP tends to be very high, which may lead to high data redundancy problem. In addition, it is clear that people of different genders usually have different ways of micro-expression. Therefore, in this paper, in order to select the relevant features of micro-expression, the gender-specific sparse multi-task learning method with adaptive regularization term is adopted to learn a compact subset of pyramid CGBP-TOP feature for micro-expression classification of different sexes. Finally, extensive experiments on widely used CASME II and SMIC databases demonstrate that our method can efficiently extract micro-expression motion features in the micro-expression video clip. Moreover, our proposed approach achieves comparable results with the state-of-the-art methods.


Introduction
Recognition of facial micro-expression is significant to a variety of real-life applications including social security, psychological diagnosis and interrogation [Wang, Yan, Li et al. (2015); Lu, Luo, Zheng et al. (2014)]. Fig. 1 shows a diagram of micro-expression recognition system, which mainly consists of micro-expression feature extraction and classification. Because of the short duration and partial-face movements of microexpression actions, the local features from different regions are extremely important for micro-expression recognition. However, micro-expression recognition is still a challenging task due to lots of irrelevant information involved in face. LGBP-TOP feature was first proposed and proved to be effectiveness in traditional facial full-expression recognition [Almaev and Valstar (2013)]. In this paper, we will introduce a pyramid Centralized Gabor Binary Pattern from Three Orthogonal Panels (CGBP-TOP) descriptor for micro-expression feature extraction. CGBP-TOP is an improvement of existing Local Gabor Binary Pattern from Three Orthogonal Panels (LGBP-TOP) feature. CGBP-TOP is also a representation approach based on multi-resolution spatial and temporal analysis. However, CGBP-TOP is improved by using Centralized Binary Pattern (CBP) instead of LBP codes. CBP preserves the local structure more efficiently. Hence, CGBP-TOP has much discriminating ability. The idea of proposing pyramid CGBP-TOP for micro-expression is motivated by the fact that local facial movements from different regions have different importance for micro-expression recognition. However, inspired by the advantage of spatial pyramid, this paper combines spatial pyramid with CGBP-TOP to further enhance its representation power for the microexpression image sequences and to improve the recognition accuracy. Considering people of different genders usually have different ways of micro-expression, we propose a gender-specific sparse multi-task micro-expression classification framework to perform micro-expression classification for male and female simultaneously. To our best knowledge, the proposed micro-expression classification method is the first gender-specific multi-task framework which is able to select the common micro-expression features for different sexes. Fig. 2 shows the overall flowchart of our proposed micro-expression recognition algorithm. Our extensive experiments on two popular micro-expression databases show the competitive performance of the proposed approach with state-of-the-arts. The remainder of this paper is organized as follows. Section 2 details the related work of micro-expression recognition. Section 3 introduces the extraction of pyramid CGBP-TOP for micro-expression image sequences. Section 4 presents the gender-specific multi-task recognition in this paper.
3 Pyramid CGBP-TOP feature Almaev et al. [Almaev and Valstar (2013)] have proposed LGBP-TOP for facial full expression recognition. In this paper, a spatial temporal descriptor named CGBP-TOP is proposed and used to extract micro-expression features. As illustrated in Fig. 3, CGBP-TOP is extracted from three orthogonal planes XY, XT and YT of the micro-expression video clip with Gabor filtering and CBP coding.
where  denotes the convolution operator. The Gabor filters used in our approach are defined as: where z=(x, y),  ,  define the scale and orientation of the Gabor filters, (3) However, the CBP compares the center pixel (xc, yc) with all pairs of connecting points of a GMP. The typical 3 3 neighborhood CBP operator is computed as follows: (4) Finally, the CGBP-TOP (i.e., CGBP-XY, CGBP-XT and CGBP-YT) can be computed as: where L is the dimension of CBP, and L=32 for CBP, i (i=1,2,3) denotes the three planes XY, XT, YT. Considering micro-expression is featured by its partial-face movement, the multilayered low-level features extracted from different local regions are important for recognition. In this paper, pyramid CGBP-TOP representation is used to extract micro-expression features. As shown in Fig. 4, we first extract the global CGBP-TOP from the input video clip. Next, we put a sequence of increasingly finer grids on each pyramid level. Then CGBP feature is extracted from each grid cell of XY, XT, YT planes on each pyramid cell, and they are concatenated to form a large histogram. Let c(g) denote the number of cells on the level g, the feature vector of the CGBP at ith plane is defined as follows: However, the micro-expression movements usually appear on some smaller local regions, we want to penalize the features extracted in larger cells. Thus the final feature is computed as the weighted sum of features from each cell on each level. Specially, in order to avoid the curse of dimensionality, we construct a sequence of grids at resolution 0, 1, 2. The weight associated with each level is then set to 1/4, 1/4, 1/2 respectively, so that it is inversely proportional to cell width of that level. After all, the pyramid CGBP-TOP can be computed by concatenating the appropriately weighted normalized histograms at all levels, and the dimension of the pyramid CGBP-TOP is 3      32  21. The CGBP-TOP is effective due to using multi-scale and multiorientation Gabor decomposition and local CBP histogram modeling.
4 Gender-specific multi-task micro-expression classification Multi-task learning is able to improve the generalization performance of a learning task with the help of some other related tasks, and learns a common feature representation from multiple tasks. In this paper, the dimension of our pyramid CGBP-TOP tends to be very high, which may lead to the curse of the dimensionality. Moreover, people of different genders usually have different ways of micro-expression. It is interesting to learn a common micro-expression feature for different sexes. Therefore, in order to select the relevant micro-expression features for both male and female, in our paper the microexpression recognition is formulated as a multi-task classification problem in which each learning task refers to a male or female micro-expression classification problem. Since there are two gender labels, two micro-expression classification tasks will be learned from corresponding datasets  1  denotes the micro-expression label. T is the number of the micro-expression labels in the dataset. Then the gender-specific multi-task joint sparse representation is achieved by solving the following optimization problem: is a loss function (e.g., Hinge loss function).
) (x f t is a decision function which we want learn for each micro-expression task, and it can be formulated as kernel functions. C is a trade-off parameter,  is a sparsity-inducing regularization term of the form as follows: where M is the number of kernel functions in the decision function. When we set 2 = q ,  is known as the 2  norm regularization term. However, in order to fit certain dataset, we employ the more efficient q  norm regularization term, where 2 1   q . By choosing q in the above intervals, we expect to obtain an adaptively penalty term to the given micro-expression dataset. About the complexity of this multi-task classification framework, we notice that the learning of T kernel functions is exactly a training process of SVM, and each SVM training scales in ) ( 3 n  with n being the number of support vectors related to task t.

Experimental results and discussions
In this section, experiments are conducted on two popular micro-expression databases, including Chinese Academy of Sciences Micro-Expression database II (CASME II) ] and Spontaneous MICro-expression database (SMIC) [Li, Pfister and Huang (2013)]. The micro-expression samples in CASME II database are collected from 26 subjects, which contain 14 females and 22 males. There are more than 240 micro-expression video clips, which mainly consists of five categories of microexpressions, including happy, disgust, repression, surprise, and others. The others microexpression contains a mixture of different emotions, so the others micro-expression is not considered in our experiments. The SMIC database is collected from 16 subjects, which contains 6 females and 10 males. SMIC covers three micro-expression categories, including positive, negative, and surprise. However, the surprise micro-expression is not used for evaluation in our experiments, because the number of the surprise microexpression samples is not enough. The implementation details and experimental results are described in the following subsections.

Experimental settings
In our experiment, the position of eyes is firstly labeled by AdaBoost classifier based on haar feature. Facial regions are then aligned based on eye positions, and further cropped to the resolution of 80×80 pixels. In pyramid CGBP-TOP, we employ 5 scales and 8 orientations Gabor filters. The dimension of CBP histogram is 32. In the pyramid, we construct a sequence of grids at resolution 0, 1, 2, and the weight associated with each level is set to 1/4, 1/4, 1/2 respectively. Thus, we obtain a pyramid CGBP-TOP feature histogram of dimensionality of 80640. Finally, the Gaussian kernel function is employed as the decision function for multi-task classification. In all experiments, we use the Leave-one-subject-out (LOSO) cross validation to evaluate our approach, where samples from one appointed subject are used as testing data and the rest served as the training samples. The average accuracy is reported as the final micro-expression recognition performance measurement. The hyperparameters C and q are tuned by the validation method. C was selected among 10 different values logarithmically sampled from the interval [0.1,…,100], while q is chosen from [1, 2].

Comparison with other methods
Micro-expression recognition experiments are conducted on both datasets to evaluate whether the proposed method provides better overall recognition accuracy than the stateof-the-arts and approaches using DTCM [Lu, Luo, Zheng et al. (2014)], LBP-SIP [Wang, See, Phan et al. (2014)], tensor [Zheng, Geng and Yang (2016)], Histogram of Oriented Optical Flow (HOOF) [Polikovsky, Kameda and Ohta (2009)], LBP-TOP [Guo, Tian, Gao et al. (2014)], Selective CNN [Patel, Hong and Zhao (2017), and LSTM [Kim, Baddar and Ro (2016)]. We also implement the LGBP-TOP feature [Almaev and Valstar (2013)] based multi-task classification method as baseline to validate the effectiveness of CGBP-TOP feature for micro-expression recognition problem. The results are shown in Tab. 1 and Tab. 2, from which we can see that our method obtains almost the best results. On the whole, our method has an improvement compared to classical LBP-TOP 9.0% for CASME II database and 4.1% for SMIC database. And compared with the results of baseline method using LGBP-TOP, the improvement is 4.6% for CASME II database and 3.6% for SMIC database. We think the reason of the improvements is that pyramid CGBP-TOP can capture more texture micro-expression characteristic and the genderspecific micro-expression multi-task learning framework further helps us learn a common representation for both sexes. Moreover, our approach is not only better than those handcrafted features, including HOOF, LBP-SIP, tensor representation and so on, but also is superior to some deep learning methods such as CNN. We can conclude that pyramid CGBP-TOP is able to learn hierarchical features from micro-expression image sequences similarly as deep learning methods do. The results prove that CGBP-TOP represents the micro-expression face well and truly.  LGBP-TOP 55.8 OURs 59.4

The effectiveness of pyramid
We conduct experiments under our multi-task learning framework using CGBP-TOP extracted on only one pyramid level to validate the effectiveness of our pyramid form of LGBP-TOP feature [Almaev and Valstar (2013)]. In particular, each video frame is divided into multiple non-overlapping rectangle sub-regions with specific size, we compare the partition of 1 1, 2 2, 3 3, 4 4, 5 5, 6 6, 7 7, 8 8 respectively. Fig. 5 shows that the highest recognition can be achieved at the smaller 2 2 block partition on the CASME II database. Fig. 6 shows that the highest recognition can be obtained by using the larger 5 5 block partition on the SMIC database. It can be seen that there is no guarantee of good results for both smaller blocks and large blocks. However, we can see that the recognition rates of our pyramid CGBP-TOP feature are always better than the CGBP-TOP extracted on only one sub-region partition for both datasets. This is because the micro-expression movements usually appear on different local regions of the face, neither the global feature nor the local feature is sufficient for recognizing local movements of facial micro-expression. Therefore, it is necessary to extract more hierarchical features for micro-expression recognition. It indicates the effectiveness of our pyramid CGBP-TOP feature which enhances the discriminating power because of the combination of the merits of spatial pyramid and the CGBP-TOP feature.

Time consumption analysis
In order to analyze the computational efficiency of our micro-expression recognition method, the time consumptions are reported for feature extraction and classification respectively. The feature extraction process of pyramid CGBP-TOP takes about 330 seconds for each video clip, and the optimization for our gender-specific multi-task learning method needs about 0.28 seconds at each trial. The experiments are conducted on Matlab platform by dual-core Intel i5 CPU. Though the kernel functions are used for the decision functions in our multi-task learning framework, we can take advantage of the wrapper techniques to make the training process very efficient. The numerical results prove that the training of our multi-task classifier is not very time-consuming. However, the way of using spatial pyramid scheme on low-level CGBP-TOP feature is very timeconsuming due to the high dimensional CGBP-TOP feature.

Conclusions
This paper proposes a gender-specific multi-task facial micro-expression recognition method based on the pyramid CGBP-TOP feature considering that the micro-expression movements usually appear on different local regions of the face. A number of experiments show the promising performance of the proposed method compared with state-of-the-arts. It is noteworthy that our approach is not only better than other handcrafted features based approaches, but also is superior to some deep learning based methods. However, the pyramid level of our feature is selected manually in this paper. The future work should be focused on determining how to automatically select the number of pyramid levels for designing more effective micro-expression features.