Skip to main content
Log in

Deep, Landmark-Free FAME: Face Alignment, Modeling, and Expression Estimation

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

We present a novel method for modeling 3D face shape, viewpoint, and expression from a single, unconstrained photo. Our method uses three deep convolutional neural networks to estimate each of these components separately. Importantly, unlike others, our method does not use facial landmark detection at test time; instead, it estimates these properties directly from image intensities. In fact, rather than using detectors, we show how accurate landmarks can be obtained as a by-product of our modeling process. We rigorously test our proposed method. To this end, we raise a number of concerns with existing practices used in evaluating face landmark detection methods. In response to these concerns, we propose novel paradigms for testing the effectiveness of rigid and non-rigid face alignment methods without relying on landmark detection benchmarks. We evaluate rigid face alignment by measuring its effects on face recognition accuracy on the challenging IJB-A and IJB-B benchmarks. Non-rigid, expression estimation is tested on the CK+ and EmotiW’17 benchmarks for emotion classification. We do, however, report the accuracy of our approach as a landmark detector for 3D landmarks on AFLW2000-3D and 2D landmarks on 300W and AFLW-PIFA. A surprising conclusion of these results is that better landmark detection accuracy does not necessarily translate to better face processing. Parts of this paper were previously published by Tran et al. (2017) and Chang et al. (2017, 2018).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Notes

  1. FPN, bundled with rendering and alignment code, publicly available from: https://github.com/fengju514/Face-Pose-Net.

  2. The train/test partitions of PIFA are available at http://cvlab.cse.msu.edu/project-pifa.html.

References

  • Artizzu, X. P., Perona, P., & Dollár, P. (2013). Robust face landmark estimation under occlusion. In Proceedings of the international conference on computer vision.

  • Asthana, A., Zafeiriou, S., Cheng, S., & Pantic, M. (2014). Incremental face alignment in the wild. In Proceedings of the conference on computer vision pattern recognition.

  • Baltrusaitis, T., Robinson, P., & Morency, L. P. (2013). Constrained local neural fields for robust facial landmark detection in the wild. In Proceedings of the conference on computer vision pattern recognition workshops.

  • Baltrušaitis, T., Robinson, P., & Morency, L. P. (2016). Openface: An open source facial behavior analysis toolkit. In Winter conference on appllications of computer vision.

  • Bansal, A., Russell, B., & Gupta, A. (2016). Marr revisited: 2D-3D alignment via surface normal prediction. In Proceedings of the conference on computer vision pattern recognition.

  • Bas, A., Smith, W. A. P., Bolkart, T., & Wuhrer, S. (2016). Fitting a 3D morphable model to edges: A comparison between hard and soft correspondences. In ACCV workshops.

  • Belhumeur, P. N., Jacobs, D. W., Kriegman, D. J., & Kumar, N. (2013). Localizing parts of faces using a consensus of exemplars. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(12), 2930–2940.

    Article  Google Scholar 

  • Bhagavatula, C., Zhu, C., Luu, K., & Savvides, M. (2017). Faster than real-time facial alignment: A 3D spatial transformer network approach in unconstrained poses. In Proceedings of the international conference on computer vision.

  • Blanz, V., & Vetter, T. (1999). Morphable model for the synthesis of 3D faces. In Proceedings of ACM SIGGRAPH conference on computer graphics.

  • Blanz, V., & Vetter, T. (2003). Face recognition based on fitting a 3d morphable model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9), 1063–1074.

    Article  Google Scholar 

  • Blanz, V., Romdhani, S., & Vetter, T. (2002). Face identification across different poses and illuminations with a 3d morphable model. In International conference on automatic face and gesture recognition.

  • Blanz, V., Scherbaum, K., Vetter, T., & Seidel, H. P. (2004). Exchanging faces in images. Computer Graphics Forum, 23(3), 669–676.

    Article  Google Scholar 

  • Booth, J., Antonakos, E., Ploumpis, S., Trigeorgis, G., Panagakis, Y., & Zafeiriou, S. (2017). 3D face morphable models “in-the-wild”. In Proceedings of conference on computer vision pattern recognition.

  • Bulat, A., & Tzimiropoulos, G. (2017a). Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources. In Proceedings of the international conference on computer vision.

  • Bulat, A., & Tzimiropoulos, G. (2017b). How far are we from solving the 2d and 3d face alignment problem? (and a dataset of 230,000 3d facial landmarks). In Proceedings of the international conference on computer vision.

  • Cao, X., Wei, Y., Wen, F., & Sun, J. (2014). Face alignment by explicit shape regression. International Journal of Computer Vision, 107(2), 177–190.

    Article  MathSciNet  Google Scholar 

  • Chang, F. J., Tran, A., Hassner, T., Masi, I., Nevatia, R., & Medioni, G. (2017) Faceposenet: Making a case for landmark-free face alignment. In Proceedings of international conference on computer vision workshops.

  • Chang, F. J., Tran, A. T., Hassner, T., Masi, I., Nevatia, R., & Medioni, G. (2018) Expnet: Landmark-free, deep, 3D facial expressions. In International conference on automatic face and gesture recognition.

  • Chu, B., Romdhani, S., & Chen, L. (2014). 3D-aided face recognition robust to expression and pose variations. In Proceedings of conference on computer vision pattern recognition.

  • Crosswhite, N., Byrne, J., Stauffer, C., Parkhi, O., Cao, Q., & Zisserman, A. (2017). Template adaptation for face verification and identification. In International conference on automatic face and gesture recognition.

  • Dantone, M., Gall, J., Fanelli, G., & Van Gool, L. (2012). Real-time facial feature detection using conditional regression forests. In Proceedings of conference on computer vision pattern recognition.

  • Dhall, A., Goecke, R., Lucey, S., & Gedeon, T. (2012). Collecting large, richly annotated facial-expression databases from movies. IEEE MultiMedia, 19(3), 34–41.

    Article  Google Scholar 

  • Dhall, A., Goecke, R., Ghosh, S., Joshi, J., Hoey, J., & Gedeon, T. (2017). From individual to group-level emotion recognition: Emotiw 5.0. In ACM ICMI.

  • Dhall, A., Murthy, O. R., Goecke, R., Joshi, J., & Gedeon, T. (2015). Video and image based emotion recognition challenges in the wild: EmotiW 2015. In: ACM ICMI.

  • Dong, X., Yan, Y., Ouyang, W., & Yang, Y. (2018a). Style aggregated network for facial landmark detection. In Proceedings of conference on computer vision pattern recognition.

  • Dong, X., Yu, S. I., Weng, X., Wei, S. E., Yang, Y., & Sheikh, Y. (2018b). Supervision-by-registration: An unsupervised approach to improve the precision of facial landmark detectors. In Proceedings of conference on computer vision pattern recognition.

  • Dong, X., Zheng, L., Ma, F., Yang, Y., & Meng, D. (2018c). Few-example object detection with model communication. IEEE Transactions on Pattern Analysis & Machine Intelligence. https://doi.org/10.1109/TPAMI.2018.2844853.

  • Dou, P., Shah, S. K., & Kakadiaris, I. A. (2017). End-to-end 3D face reconstruction with deep neural networks. In Proceedings of conference on computer vision pattern recognition.

  • Eidinger, E., Enbar, R., & Hassner, T. (2014). Age and gender estimation of unfiltered faces. IEEE Transactions on Information Forensics and Security, 9(12), 2170–2179.

    Article  Google Scholar 

  • Everingham, M., Sivic, J., & Zisserman, A. (2006). “Hello! My name is... Buffy”—Automatic naming of characters in TV video. In Proceedings of British machine vision conference.

  • Fabian Benitez-Quiroz, C., Srinivasan, R., & Martinez, A. M. (2016). Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In Proceedings of conference on computer vision pattern recognition.

  • Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision. Cambridge: Cambridge University Press.

    MATH  Google Scholar 

  • Hassner, T. (2013). Viewing real-world faces in 3D. In Proceedings of the international conference on computer vision. Available www.openu.ac.il/home/hassner/projects/poses.

  • Hassner, T., & Basri, R. (2006). Example based 3D reconstruction from single 2D images. In Proceedings of conference on computer vision pattern recognition workshops.

  • Hassner, T., Harel, S., Paz, E., & Enbar, R. (2015). Effective face frontalization in unconstrained images. In Proceedings of conference on computer vision pattern recognition.

  • Hassner, T., Masi, I., Kim, J., Choi, J., Harel, S., Natarajan, P., & Medioni, G. (2016). Pooling faces: Template based face recognition with pooled face images. In Proceedings of conference on computer vision pattern recognition workshops.

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of conference on computer vision pattern recognition.

  • Huang, G. B., Jain, V., & Learned-Miller, E. (2007). Unsupervised joint alignment of complex images. In Proceedings of the international conference on computer vision.

  • Huber, P., Hu, G., Tena, R., Mortazavian, P., Koppen, W., Christmas, W., Rtsch, M., & Kittler, J. (2016). A multiresolution 3D morphable face model and fitting framework. In VISAPP.

  • Jackson, A. S., Bulat, A., Argyriou, V., & Tzimiropoulos, G. (2017). Large pose 3D face reconstruction from a single image via direct volumetric CNN regression. In Proceedings of the international conference on computer vision

  • Jeni, L. A., Cohn, J. F., & Kanade, T. (2015). Dense 3D face alignment from 2D videos in real-time. In International conference on automatic face and gesture recognition.

  • Jourabloo, A., & Liu, X. (2015). Pose-invariant 3d face alignment. In Proceedings of conference on computer vision pattern recognition.

  • Jourabloo, A., & Liu, X. (2016). Large-pose face alignment via cnn-based dense 3D model fitting. In Proceedings of conference on computer vision pattern recognition.

  • Kazemi, V., & Sullivan, J. (2014). One millisecond face alignment with an ensemble of regression trees. In Proceedings of conference on computer vision pattern recognition.

  • Kemelmacher-Shlizerman, I., & Basri, R. (2011). 3D face reconstruction from a single image using a single reference face shape. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(2), 394–405.

    Article  Google Scholar 

  • King, D. E. (2009). Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research, 10, 1755–1758.

    Google Scholar 

  • Klare, B. F., Klein, B., Taborsky, E., Blanton, A., Cheney, J., Allen, K., Grother, P., Mah, A., Burge, M., & Jain, A. K. (2015). Pushing the frontiers of unconstrained face detection and recognition: IARPA Janus Benchmark-A. In Proceedings of conference on computer vision pattern recognition.

  • Kosti, R., Alvarez, J. M., Recasens, A., & Lapedriza, A. (2017). Emotion recognition in context. In Proceedings of conference on computer vision pattern recognition.

  • Köstinger, M., Wohlhart, P., Roth, P. M., & Bischof, H. (2011). Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In Proceedings of the international conference on computer vision workshops.

  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Neural information processing systems.

  • Kumar, A., Alavi, A., & Chellappa, R. (2017). Kepler: Keypoint and pose estimation of unconstrained faces by learning efficient h-cnn regressors. In Automatic face and gesture recognition.

  • Kumar, A., & Chellappa, R. (2018). Disentangling 3D pose in a dendritic cnn for unconstrained 2d face alignment. In Proceedings of conference on computer vision pattern recognition.

  • Le, V., Brandt, J., Lin, Z., Bourdev, L., & Huang, T. (2012). Interactive facial feature localization. In European conference on computer vision.

  • Levi, G., & Hassner, T. (2015). Emotion recognition in the wild via convolutional neural networks and mapped binary patterns. In ACM ICMI.

  • Li, C., Zhou, K., & Lin, S. (2014). Intrinsic face image decomposition with human face priors. In European conference on computer vision.

  • Liu, Y., Jourabloo, A., Ren, W., & Liu, X. (2017). Dense face alignment. In Proceedings of conference on computer vision pattern recognition.

  • Liu, Z., Luo, P., Wang, X., & Tang, X. (2015). Deep learning face attributes in the wild. In Proceedings of the international conference on computer vision.

  • Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010) The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In Proceedings of conference on computer vision pattern recognition workshops.

  • Masi, I., Ferrari, C., Del Bimbo, A., & Medioni, G. (2014). Pose independent face recognition by localizing local binary patterns via deformation components. In International conference on pattern recognition (pp. 4477–4482). IEEE.

  • Masi, I., Chang, F. J., Choi, J., Harel, S., Kim, J., Kim, K., et al. (2018a). Learning pose-aware models for pose-invariant face recognition in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(2), 379–393.

    Article  Google Scholar 

  • Masi, I., Hassner, T., Tran, A. T., & Medioni, G. (2017). Rapid synthesis of massive face sets for improved face recognition. In International conference on automatic face and gesture recognition (pp. 604–611). IEEE.

  • Masi, I., Rawls, S., Medioni, G., & Natarajan, P. (2016a). Pose-aware face recognition in the wild. In Proceedings of conference on computer vision pattern recognition.

  • Masi, I., Tran, A., Hassner, T., Leksut, J. T., & Medioni, G. (2016b). Do we really need to collect millions of faces for effective face recognition?. In European conference computer vision. Available www.openu.ac.il/home/hassner/projects/augmented_faces.

  • Masi, I., Wu, Y., Hassner, T., & Natarajan, P. (2018b). Deep face recognition: A survey. In Conference on graphics, patterns and images.

  • Parkhi, O. M., Vedaldi, A., & Zisserman, A. (2015). Deep face recognition. In Proceedings of British machine vision conference.

  • Paysan, P., Knothe, R., Amberg, B., Romhani, S., & Vetter, T. (2009). A 3D face model for pose and illumination invariant face recognition. In International conference on advanced video and signal based surveillance.

  • Poirson, P., Ammirato, P., Fu, C. Y., Liu, W., Kosecka, J., & Berg, A. C. (2016). Fast single shot detection and pose estimation. In 3DV.

  • Ranjan, R., Castillo, C. D., & Chellappa, R. (2017). L2-constrained softmax loss for discriminative face verification. arXiv preprint arXiv:1703.09507.

  • Ren, S., Cao, X., Wei, Y., & Sun, J. (2014). Face alignment at 3000 fps via regressing local binary features. In Proceedings of conference on computer vision pattern recognition.

  • Richardson, E., Sela, M., & Kimmel, R. (2016). 3d face reconstruction by learning from synthetic data. In 3DV.

  • Richardson, E., Sela, M., Or-El, R., & Kimmel, R. (2017). Learning detailed face reconstruction from a single image. In Proceedings of conference on computer vision pattern recognition.

  • Romdhani, S., & Vetter, T. (2003). Efficient, robust and accurate fitting of a 3D morphable model. In Proceedings of the international conference on computer vision.

  • Romdhani, S., & Vetter, T. (2005). Estimating 3D shape and texture using pixel intensity, edges, specular highlights, texture constraints and a prior. In Proceedings of conference on computer vision pattern recognition.

  • Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., & Pantic, M. (2013). 300 faces in-the-wild challenge: The first facial landmark localization challenge. In Proceedings of conference on computer vision pattern recognition workshops.

  • Sagonas, C., Antonakos, E., Tzimiropoulos, G., Zafeiriou, S., & Pantic, M. (2016). 300 faces in-the-wild challenge: Database and results. Image and Vision Computing, 47, 3–18.

    Article  Google Scholar 

  • Sela, M., Richardson, E., & Kimmel, R. (2017). Unrestricted facial geometry reconstruction using image-to-image translation. In Proceedings of the international conference on computer vision.

  • Sengupta, S., Kanazawa, A., Castillo, C. D., & Jacobs, D. (2018). SfSNet: Learning shape, reflectance and illuminance of faces in the wild. In Proceedings of conference on computer vision pattern recognition.

  • Su, H., Qi, C. R., Li, Y., & Guibas, L. J. (2015). Render for CNN: Viewpoint estimation in images using CNNs trained with rendered 3D model views. In Proceedings of the international conference on computer vision.

  • Surace, L., Patacchiola, M., Battini Sönmez, E., Spataro, W., & Cangelosi, A. (2017). Emotion recognition in the wild using deep neural networks and Bayesian classifiers. In ACM ICMI.

  • Tang, H., Hu, Y., Fu, Y., Hasegawa-Johnson, M., & Huang, T. S. (2008). Real-time conversion from a single 2d face image to a 3D text-driven emotive audio-visual avatar. In International conference on multimedia and expo.

  • Tewari, A., Zollhfer, M., Garrido, P., Florian Bernard, H. K., Prez, P., & Theobalt, C. (2018). Self-supervised multi-level face model learning for monocular reconstruction at over 250 Hz. In Proceedings of conference on computer vision pattern recognition.

  • Tran, A., Hassner, T., Masi, I., & Medioni, G. (2017). Regressing robust and discriminative 3D morphable models with a very deep neural network. In Proceedings of conference on computer vision pattern recognition.

  • Tran, A. T., Hassner, T., Masi, I., Paz, E., Nirkin, Y., & Medioni, G. (2018) Extreme 3D face reconstruction: Looking past occlusions. In Proceedings of conference on computer vision pattern recognition.

  • Vetter, T., & Blanz, V. (1998). Estimating coloured 3D face models from single images: An example based approach. In European conference on computer vision.

  • Whitelam, C., Taborsky, E., Blanton, A., Maze, B., Adams, J., Miller, T., Kalka, N., Jain, A. K., Duncan, J. A., & Allen, K., et al. (2017). Iarpa janus benchmark-b face dataset. In Proceedings of conference on computer vision pattern recognition workshops.

  • Wolf, L., Hassner, T., & Maoz, I. (2011). Face recognition in unconstrained videos with matched background similarity. In Proceedings of conference on computer vision pattern recognition.

  • Wu, Y., Hassner, T., Kim, K., Medioni, G., & Natarajan, P. (2017). Facial landmark detection with tweaked convolutional neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12), 3067–3074.

    Article  Google Scholar 

  • Xiang, Y., Mottaghi, R., & Savarese, S. (2014). Beyond pascal: A benchmark for 3D object detection in the wild. In Winter conference on applications of computer vision.

  • Xiang, Y., Kim, W., Chen, W., Ji, J., Choy, C., Su, H., Mottaghi, R., Guibas, L., & Savarese, S. (2016). Objectnet3D: A large scale database for 3D object recognition. In European conference on computer vision.

  • Xie, L., Wang, J., Wei, Z., Wang, M., & Tian, Q. (2016). Disturblabel: Regularizing cnn on the loss layer. In Proceedings of conference on computer vision pattern recognition (pp. 4753–4762).

  • Xiong, X., & De la Torre, F. (2013). Supervised descent method and its applications to face alignment. In Proceedings of conference on computer vision pattern recognition.

  • Yang, Z., & Nevatia, R. (2016). A multi-scale cascade fully convolutional network face detector. In ICPR.

  • Yang, F., Wang, J., Shechtman, E., Bourdev, L., & Metaxas, D. (2011). Expression flow for 3D-aware face component transfer. ACM Transactions on Graphics, 30(4), 60.

    Article  Google Scholar 

  • Yi, D., Lei, Z., Liao, S., & Li, S. Z. (2014). Learning face representation from scratch. arXiv preprint arXiv:1411.7923. Available http://www.cbsr.ia.ac.cn/english/CASIA-WebFace-Database.html.

  • Yu, X., Huang, J., Zhang, S., Yan, W., & Metaxas, D. N. (2013). Pose-free facial landmark fitting via optimized part mixtures and cascaded deformable shape model. In Proceedings of the international conference on computer vision (pp. 1944–1951). IEEE.

  • Zadeh, A., Baltrušaitis, T., & Morency, L. P. (2016). Deep constrained local models for facial landmark detection. arXiv preprint arXiv:1611.08657.

  • Zafeiriou, S., Chrysos, G. G., Roussos, A., Ververas, E., Deng, J., & Trigeorgis, G. (2017). The 3D menpo facial landmark tracking challenge. In Proceedings of international conference on computer vision workshops.

  • Zafeiriou, S., Papaioannou, A., Kotsia, I., Nicolaou, M., & Zhao, G. (2016) Facial affect “in-the-wild”. In Proceedings of conference on computer vision pattern recognition workshops (pp. 36–47).

  • Zhang, J., Shan, S., Kan, M., & Chen, X. (2014). Coarse-to-fine auto-encoder networks (CFAN) for real-time face alignment. In European conference on computer vision. Springer.

  • Zhang, K., Tan, L., Li, Z., & Qiao, Y. (2016). Gender and smile classification using deep convolutional neural networks. In Proceedings of conference on computer vision pattern recognition workshops (pp. 34–38).

  • Zhu, S., Li, C., Change Loy, C., & Tang, X. (2015a). Face alignment by coarse-to-fine shape searching. In Proceedings of conference on computer vision pattern recognition.

  • Zhu, S., Li, C., Loy, C. C., & Tang, X. (2016a). Unconstrained face alignment via cascaded compositional learning. In Proceedings of conference on computer vision pattern recognition.

  • Zhu, X., Lei, Z., Liu, X., Shi, H., & Li, S. (2016b). Face alignment across large poses: A 3D solution. In Proceedings of conference on computer vision pattern recognition.

  • Zhu, X., Lei, Z., Yan, J., Yi, D., & Li, S. Z. (2015b). High-fidelity pose and expression normalization for face recognition in the wild. In Proceedings of conference on computer vision pattern recognition (pp. 787–796).

  • Zhu, X., & Ramanan, D. (2012). Face detection, pose estimation, and landmark localization in the wild. In Proceedings of conference on computer vision pattern recognition.

Download references

Acknowledgements

This research is based upon work supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via IARPA 2014-14071600011. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of ODNI, IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purpose notwithstanding any copyright annotation thereon.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tal Hassner.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chang, FJ., Tran, A.T., Hassner, T. et al. Deep, Landmark-Free FAME: Face Alignment, Modeling, and Expression Estimation. Int J Comput Vis 127, 930–956 (2019). https://doi.org/10.1007/s11263-019-01151-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-019-01151-x

Keywords

Navigation