Elsevier

Neurocomputing

Volume 214, 19 November 2016, Pages 785-795
Neurocomputing

Local subspace smoothness alignment for constrained local model fitting

https://doi.org/10.1016/j.neucom.2016.07.007Get rights and content

Abstract

Constrained local model (CLM) is a classic method for facial landmarks estimation. While the CLM enhances the well-known active shape model with discriminative local appearance models, its shape model is based on the point distribution model, which is essentially principal component analysis over the training facial shape vectors and hence the nonlinear manifold of facial shapes is not well embedded. In this paper, we propose a novel manifold learning method, i.e., local subspace smoothness alignment (LSSA), to address this issue. The LSSA approach smoothes the nonlinear structure directly in the original feature space, with a newly defined geometric measure for the curvature of the local structures. We then proceed to apply this method for face alignment, with an ensemble of correlated local subspaces derived from LSSA. The proposed method is demonstrated on both toy data and real-world datasets that it yields reasonable manifold embedding and leads to encouraging performance for face alignment even under difficult conditions.

Introduction

In face recognition, the representation of face images is a very important issue to be addressed. Although the gray-scale face images can be directly used as input in some methods such as non-negative matrix factorization [1], sparse representation classification [2], etc., they usually assume that these face images are cropped or the facial features in different images with the same semantics are well aligned. However, this is usually not the case in practice. In fact, the problem of aligning facial feature points is so difficult that it is a separate research topic in the field of face recognition [3], [4], [5], [6], [7], [8], called face alignment or facial landmarks estimation. Nowadays the successful registration and tracking of non-rigidly varying geometric landmarks on face has become a key ingredient to an automatic facial analysis system [9], [10], [11], [12].

The challenges of facial landmarks estimation mainly come from the variety of appearance patches centered on the landmarks, such as lighting, occlusion, expression and so on. Many approaches for accurate non-rigid facial registration and face tracking focus on building a synthesis model to reconstruct the landmarks of a possibly unseen face image based on the facial shapes and appearances of training images. One of the most famous models is the active shape model (ASM) [13], which derives the positions of landmarks based on the statistical information of landmarks distribution. In ASM, the point distribution model (PDM) [13] is used to model the valid shape space of face landmarks with a set of deformation parameters.

Constrained local model (CLM) [14] is another famous approach for non-rigid face registration/tracking. CLM is a generalization of ASM in the sense that the searching space of ASM for potential facial landmarks is 1D, while the CLMs are based on the 2D response map. 2D response maps are usually estimated by a discriminative local appearance model and can better capture appearance information around facial landmarks, and this information, if used wisely, should give better results.

Many CLM variations have been proposed recently. These methods pursue the same goal as CLM but use more robust and complex models based on the distribution of landmarks response. Particularly, the searching strategy of the original CLM is based on the hypothesis that the locations of facial landmarks obey a distribution of isotropic Gaussian, which is obviously not so realistic. So many methods consider anisotropic Gaussian instead [15], [16], [17]. Although this anisotropic Gaussian approximation of the response maps effectively overcomes some drawbacks of its isotropic counterpart, sometimes their performance can be poor especially when the facial appearance changes a lot. To address it, some other models are investigated as well, such as Gaussian mixture model (GMM) [18] and nonparametric model [19]. Additionally, some works focus on improving the quality of response maps [20], [21], [22], [23]. These methods have a common characteristic, that is dividing and conquering, independently training a special local model (detector, regressor, or part template) for each feature point. So they are known as local methods.

Relatively, the methods which consider all the feature points as a whole, rather than treat them as conditionally independent are regarded as holistic method. Active appearance models (AAMs) [24] is the representative method. It simultaneously models the intrinsic variation in both appearance and shape as a linear combination of basis models of variation. Among the holistic methods, explicit shape regression (ESR) [25], supervised descent method (SDM) [26], ensemble of regression trees (ERT) [27] and local binary feature (LBF) [3] are four state-of-the-art methods. All of them performed under the cascaded shape regression framework using shape-indexed features. ESR directly learns a regression function to infer the shape from a sparse subset of pixel intensities indexed relative to current shape estimate, while ERT substitutes the weak fern regressor in ESR with a 4 regression tree which further improves the performance. SDM employs a cascaded linear regression to estimate the shape based on hand-designed SIFT feature, while LBF learns a set of highly discriminative local binary features for each feature point independently, and then uses the learned features jointly to learn a linear regression for the final prediction, which is highly efficient and achieves very accurate performance.

Despite these methods archived partial successes in face alignment, the limitation of CLMs still remains. Particularly, most CLM based models use the PDM to model the shape space. The PDM is essentially a linear approximation to the shape of a non-rigid object deformations with a global rigid transformation. Compared with the complexity of the design on the response distribution, the PDM model is too rough – actually, due to the highly nonlinear and non-convex of the facial shape space, linear analysis used by PDM is far from adequate.

In this paper, we propose a novel manifold learning method, i.e., local subspace smoothness alignment (LSSA), to address this issue. The LSSA approach smoothes the nonlinear structure directly in the original feature space, with a newly defined geometric measure for the curvature of the local structures. After performing the LSSA transformation, we use the adjacent shapes for CLM fitting in ensemble of correlated local subspaces.

This paper is organized as follows: the background on PDM and manifold learning are described in Section 2. The motivation and details of local subspace smoothness alignment are described in Section 3. CLM fitting with an ensemble of local subspaces learnt from LSSA is given in Section 4. Comparison experiments on the works of manifold learning and extensive experiments on demonstrating the importance of the prior on manifold in CLM fitting are shown in Section 5; we conclude this paper in Section 6 at last.

Section snippets

The point distribution model

Both ASM and CLM use the point distribution model (PDM) to model the shape space. Specifically, based on the principal component analysis method (PCA), the PDM reconstructs the facial shape of an unseen face image linearly:x=sR(x¯+Φq)+t,where R, s and t control the rigid rotation, scale and translations respectively while q controls the non-rigid variations of the shape and Φ denotes the submatrix of the basis of variations. Then all the parameters of the shape model can be denoted as p={s,R,t,q

Local subspace smoothness alignment (LSSA)

In this section, we describe our local subspace smoothness alignment method, which overcomes some limitations of the traditional manifold learning methods.

Face alignment with an ensemble of local subspaces

In this section, we show how to apply the proposed method for face alignment, which effectively improves the robustness of CLM fitting compared to the traditional PDM model. Let us denote x0 the shape of points formed by concatenating the locations of facial key points estimated with discriminative detectors, then one of the most important components in a face alignment system is to verify whether this newly estimated shape of x0 is a valid “face” shape, and further to recommend a better one

Experiments

In this section, we present our experiments on two tasks. The first one is on manifold learning, in which we compare the proposed method with several classic manifold learning methods. Then we applied our method on the task of face alignment, and verified its performance on two challenging face databases, i.e., LFPW database [21] and LFW database [42].

Conclusion

In this paper, we present a novel manifold based constrained local model fitting named local subspace smoothness alignment (LSSA). The LSSA method learns the manifold in the original dimensionality with a new geometric measurement for the curvature of local structures. Based on the learnt manifold, we introduce an improved face alignment method under the framework of constrained local model (CLM). It performs robust CLM fitting in the original feature space but using adjacent deformations of

Acknowledgments

We thank the anonymous reviewers for their in-depth comments, suggestions, and corrections, which have greatly improved the paper. This work is partially supported by National Science Foundation of China (61373060) and Qing Lan Project.

Dakun Liu was born in 1984. He received his BS and MS degrees in applied mathematics in 2006 and 2009 respectively. He received his Ph.D. degree in computer science and technology from Nanjing University of Aeronautics and Astronautics in 2016. Now he is a lecturer in Yancheng Institution of Technology. His research interests include computer vision, machine learning, etc.

References (47)

  • D.D. Lee et al.

    Learning the parts of objects by non-negative matrix factorization

    Nature

    (1999)
  • J. Wright et al.

    Robust face recognition via sparse representation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2009)
  • S. Ren, X. Cao, Y. Wei, J. Sun, Face alignment at 3000fps via regressing local binary features, in: 2014 IEEE...
  • X. Cao et al.

    Face alignment by explicit shape regression

    Int. J. Comput. Vis.

    (2014)
  • D. Lee, H. Park, C.D. Yoo, Face alignment using cascade gaussian process regression trees, in: Proceedings of the IEEE...
  • S. Zhu, C. Li, C.C. Loy, X. Tang, Face alignment by coarse-to-fine shape searching, in: Proceedings of the IEEE...
  • Y. Sun et al.

    Low rank driven robust facial landmark regression

    Neurocomputing

    (2015)
  • Y. Yang, Y. Su, D. Cai, M. Xu, Nonlinear deformation learning for face alignment across expression and pose,...
  • Z. Cui, S. Shan, H. Zhang, S. Lao, X. Chen, Image sets alignment for video-based face recognition, in: 2012 IEEE...
  • V. Le, J. Brandt, Z. Lin, L. Bourdev, T.S. Huang, Interactive facial feature localization, in: Computer Vision—ECCV...
  • G. Tzimiropoulos, Project-out cascaded regression with an application to face alignment, in: Proceedings of the IEEE...
  • Y. Zhang et al.

    Automatic face annotation in tv series by video/script alignment

    Neurocomputing

    (2015)
  • T.F. Cootes et al.

    Active shape models – their training and application

    Comput. Vis. Image Underst.

    (1995)
  • D. Cristinacce, T.F. Cootes, Feature detection and tracking with constrained local models, in: BMVC, vol. 1, Citeseer,...
  • K. Nickels et al.

    Estimating uncertainty in ssd-based feature tracking

    Image Vis. Comput.

    (2002)
  • X.S. Zhou et al.

    An information fusion framework for robust shape tracking

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2005)
  • Y. Wang, S. Lucey, J.F. Cohn, Enforcing convexity for improved alignment with constrained local models, in: 2008 IEEE...
  • L. Gu, T. Kanade, A generative shape regularization model for robust face alignment, in: Computer Vision—ECCV 2008,...
  • J.M. Saragih et al.

    Deformable model fitting by regularized landmark mean-shift

    Int. J. Comput. Vis.

    (2011)
  • A. Asthana, S. Zafeiriou, S. Cheng, M. Pantic, Robust discriminative response map fitting with constrained local...
  • P.N. Belhumeur et al.

    Localizing parts of faces using a consensus of exemplars

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2013)
  • S. Cheng, S. Zafeiriou, A. Asthana, M. Pantic, 3d facial geometric features for constrained local model, in: 2014 IEEE...
  • P. Martins, R. Caseiro, J.F. Henriques, J. Batista, Likelihood-enhanced Bayesian constrained local models, in: 2014...
  • Cited by (0)

    Dakun Liu was born in 1984. He received his BS and MS degrees in applied mathematics in 2006 and 2009 respectively. He received his Ph.D. degree in computer science and technology from Nanjing University of Aeronautics and Astronautics in 2016. Now he is a lecturer in Yancheng Institution of Technology. His research interests include computer vision, machine learning, etc.

    Xiaoyang Tan was born in 1971. He received the Ph.D. degree in machine learning from Nanjing University in 2005. He is a professor and Ph.D. supervisor at Nanjing University of Aeronautics and Astronautics. His research interests include computer vision, machine learning, etc.

    View full text