Local subspace smoothness alignment for constrained local model fitting
Introduction
In face recognition, the representation of face images is a very important issue to be addressed. Although the gray-scale face images can be directly used as input in some methods such as non-negative matrix factorization [1], sparse representation classification [2], etc., they usually assume that these face images are cropped or the facial features in different images with the same semantics are well aligned. However, this is usually not the case in practice. In fact, the problem of aligning facial feature points is so difficult that it is a separate research topic in the field of face recognition [3], [4], [5], [6], [7], [8], called face alignment or facial landmarks estimation. Nowadays the successful registration and tracking of non-rigidly varying geometric landmarks on face has become a key ingredient to an automatic facial analysis system [9], [10], [11], [12].
The challenges of facial landmarks estimation mainly come from the variety of appearance patches centered on the landmarks, such as lighting, occlusion, expression and so on. Many approaches for accurate non-rigid facial registration and face tracking focus on building a synthesis model to reconstruct the landmarks of a possibly unseen face image based on the facial shapes and appearances of training images. One of the most famous models is the active shape model (ASM) [13], which derives the positions of landmarks based on the statistical information of landmarks distribution. In ASM, the point distribution model (PDM) [13] is used to model the valid shape space of face landmarks with a set of deformation parameters.
Constrained local model (CLM) [14] is another famous approach for non-rigid face registration/tracking. CLM is a generalization of ASM in the sense that the searching space of ASM for potential facial landmarks is 1D, while the CLMs are based on the 2D response map. 2D response maps are usually estimated by a discriminative local appearance model and can better capture appearance information around facial landmarks, and this information, if used wisely, should give better results.
Many CLM variations have been proposed recently. These methods pursue the same goal as CLM but use more robust and complex models based on the distribution of landmarks response. Particularly, the searching strategy of the original CLM is based on the hypothesis that the locations of facial landmarks obey a distribution of isotropic Gaussian, which is obviously not so realistic. So many methods consider anisotropic Gaussian instead [15], [16], [17]. Although this anisotropic Gaussian approximation of the response maps effectively overcomes some drawbacks of its isotropic counterpart, sometimes their performance can be poor especially when the facial appearance changes a lot. To address it, some other models are investigated as well, such as Gaussian mixture model (GMM) [18] and nonparametric model [19]. Additionally, some works focus on improving the quality of response maps [20], [21], [22], [23]. These methods have a common characteristic, that is dividing and conquering, independently training a special local model (detector, regressor, or part template) for each feature point. So they are known as local methods.
Relatively, the methods which consider all the feature points as a whole, rather than treat them as conditionally independent are regarded as holistic method. Active appearance models (AAMs) [24] is the representative method. It simultaneously models the intrinsic variation in both appearance and shape as a linear combination of basis models of variation. Among the holistic methods, explicit shape regression (ESR) [25], supervised descent method (SDM) [26], ensemble of regression trees (ERT) [27] and local binary feature (LBF) [3] are four state-of-the-art methods. All of them performed under the cascaded shape regression framework using shape-indexed features. ESR directly learns a regression function to infer the shape from a sparse subset of pixel intensities indexed relative to current shape estimate, while ERT substitutes the weak fern regressor in ESR with a 4 regression tree which further improves the performance. SDM employs a cascaded linear regression to estimate the shape based on hand-designed SIFT feature, while LBF learns a set of highly discriminative local binary features for each feature point independently, and then uses the learned features jointly to learn a linear regression for the final prediction, which is highly efficient and achieves very accurate performance.
Despite these methods archived partial successes in face alignment, the limitation of CLMs still remains. Particularly, most CLM based models use the PDM to model the shape space. The PDM is essentially a linear approximation to the shape of a non-rigid object deformations with a global rigid transformation. Compared with the complexity of the design on the response distribution, the PDM model is too rough – actually, due to the highly nonlinear and non-convex of the facial shape space, linear analysis used by PDM is far from adequate.
In this paper, we propose a novel manifold learning method, i.e., local subspace smoothness alignment (LSSA), to address this issue. The LSSA approach smoothes the nonlinear structure directly in the original feature space, with a newly defined geometric measure for the curvature of the local structures. After performing the LSSA transformation, we use the adjacent shapes for CLM fitting in ensemble of correlated local subspaces.
This paper is organized as follows: the background on PDM and manifold learning are described in Section 2. The motivation and details of local subspace smoothness alignment are described in Section 3. CLM fitting with an ensemble of local subspaces learnt from LSSA is given in Section 4. Comparison experiments on the works of manifold learning and extensive experiments on demonstrating the importance of the prior on manifold in CLM fitting are shown in Section 5; we conclude this paper in Section 6 at last.
Section snippets
The point distribution model
Both ASM and CLM use the point distribution model (PDM) to model the shape space. Specifically, based on the principal component analysis method (PCA), the PDM reconstructs the facial shape of an unseen face image linearly:where R, s and t control the rigid rotation, scale and translations respectively while q controls the non-rigid variations of the shape and denotes the submatrix of the basis of variations. Then all the parameters of the shape model can be denoted as
Local subspace smoothness alignment (LSSA)
In this section, we describe our local subspace smoothness alignment method, which overcomes some limitations of the traditional manifold learning methods.
Face alignment with an ensemble of local subspaces
In this section, we show how to apply the proposed method for face alignment, which effectively improves the robustness of CLM fitting compared to the traditional PDM model. Let us denote the shape of points formed by concatenating the locations of facial key points estimated with discriminative detectors, then one of the most important components in a face alignment system is to verify whether this newly estimated shape of is a valid “face” shape, and further to recommend a better one
Experiments
In this section, we present our experiments on two tasks. The first one is on manifold learning, in which we compare the proposed method with several classic manifold learning methods. Then we applied our method on the task of face alignment, and verified its performance on two challenging face databases, i.e., LFPW database [21] and LFW database [42].
Conclusion
In this paper, we present a novel manifold based constrained local model fitting named local subspace smoothness alignment (LSSA). The LSSA method learns the manifold in the original dimensionality with a new geometric measurement for the curvature of local structures. Based on the learnt manifold, we introduce an improved face alignment method under the framework of constrained local model (CLM). It performs robust CLM fitting in the original feature space but using adjacent deformations of
Acknowledgments
We thank the anonymous reviewers for their in-depth comments, suggestions, and corrections, which have greatly improved the paper. This work is partially supported by National Science Foundation of China (61373060) and Qing Lan Project.
Dakun Liu was born in 1984. He received his BS and MS degrees in applied mathematics in 2006 and 2009 respectively. He received his Ph.D. degree in computer science and technology from Nanjing University of Aeronautics and Astronautics in 2016. Now he is a lecturer in Yancheng Institution of Technology. His research interests include computer vision, machine learning, etc.
References (47)
- et al.
Learning the parts of objects by non-negative matrix factorization
Nature
(1999) - et al.
Robust face recognition via sparse representation
IEEE Trans. Pattern Anal. Mach. Intell.
(2009) - S. Ren, X. Cao, Y. Wei, J. Sun, Face alignment at 3000fps via regressing local binary features, in: 2014 IEEE...
- et al.
Face alignment by explicit shape regression
Int. J. Comput. Vis.
(2014) - D. Lee, H. Park, C.D. Yoo, Face alignment using cascade gaussian process regression trees, in: Proceedings of the IEEE...
- S. Zhu, C. Li, C.C. Loy, X. Tang, Face alignment by coarse-to-fine shape searching, in: Proceedings of the IEEE...
- et al.
Low rank driven robust facial landmark regression
Neurocomputing
(2015) - Y. Yang, Y. Su, D. Cai, M. Xu, Nonlinear deformation learning for face alignment across expression and pose,...
- Z. Cui, S. Shan, H. Zhang, S. Lao, X. Chen, Image sets alignment for video-based face recognition, in: 2012 IEEE...
- V. Le, J. Brandt, Z. Lin, L. Bourdev, T.S. Huang, Interactive facial feature localization, in: Computer Vision—ECCV...
Automatic face annotation in tv series by video/script alignment
Neurocomputing
Active shape models – their training and application
Comput. Vis. Image Underst.
Estimating uncertainty in ssd-based feature tracking
Image Vis. Comput.
An information fusion framework for robust shape tracking
IEEE Trans. Pattern Anal. Mach. Intell.
Deformable model fitting by regularized landmark mean-shift
Int. J. Comput. Vis.
Localizing parts of faces using a consensus of exemplars
IEEE Trans. Pattern Anal. Mach. Intell.
Cited by (0)
Dakun Liu was born in 1984. He received his BS and MS degrees in applied mathematics in 2006 and 2009 respectively. He received his Ph.D. degree in computer science and technology from Nanjing University of Aeronautics and Astronautics in 2016. Now he is a lecturer in Yancheng Institution of Technology. His research interests include computer vision, machine learning, etc.
Xiaoyang Tan was born in 1971. He received the Ph.D. degree in machine learning from Nanjing University in 2005. He is a professor and Ph.D. supervisor at Nanjing University of Aeronautics and Astronautics. His research interests include computer vision, machine learning, etc.