skip to main content
10.1145/3394171.3413509acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Attentive One-Dimensional Heatmap Regression for Facial Landmark Detection and Tracking

Authors Info & Claims
Published:12 October 2020Publication History

ABSTRACT

Although heatmap regression is considered a state-of-the-art method to locate facial landmarks, it suffers from huge spatial complexity and is prone to quantization error. To address this, we propose a novel attentive one-dimensional heatmap regression method for facial landmark localization. First, we predict two groups of 1D heatmaps to represent the marginal distributions of the x and y coordinates. These 1D heatmaps reduce spatial complexity significantly compared to current heatmap regression methods, which use 2D heatmaps to represent the joint distributions of x and y coordinates. With much lower spatial complexity, the proposed method can output high-resolution 1D heatmaps despite limited GPU memory, significantly alleviating the quantization error. Second, a co-attention mechanism is adopted to model the inherent spatial patterns existing in x and y coordinates, and therefore the joint distributions on the x and y axes are also captured. Third, based on the 1D heatmap structures, we propose a facial landmark detector capturing spatial patterns for landmark detection on an image; and a tracker further capturing temporal patterns with a temporal refinement mechanism for landmark tracking. Experimental results on four benchmark databases demonstrate the superiority of our method.

Skip Supplemental Material Section

Supplemental Material

3394171.3413509.mp4

mp4

189.7 MB

References

  1. Akshay Asthana, Stefanos Zafeiriou, Shiyang Cheng, and Maja Pantic. 2014. Incremental Face Alignment in the Wild. In CVPR. 1859--1866.Google ScholarGoogle Scholar
  2. Adrian Bulat and Georgios Tzimiropoulos. 2017. How far are we from solving the 2d & 3d face alignment problem?(and a dataset of 230,000 3d facial landmarks). In ICCV. 1021--1030.Google ScholarGoogle Scholar
  3. Xavier P. Burgos-Artizzu, Pietro Perona, and Piotr Dollá r. 2013. Robust Face Landmark Estimation under Occlusion. In ICCV. 1513--1520.Google ScholarGoogle Scholar
  4. Xudong Cao, Yichen Wei, Fang Wen, and Jian Sun. 2014. Face alignment by explicit shape regression. IJCV, Vol. 107, 2 (2014), 177--190.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Lisha Chen, Hui Su, and Qiang Ji. 2019 b. Deep Structured Prediction for Facial Landmark Detection. In NeurIPS.Google ScholarGoogle Scholar
  6. Y Chen, C Shen, H Chen, XS Wei, L Liu, and J Yang. 2019 a. Adversarial Learning of Structure-Aware Fully Convolutional Networks for Landmark Localization. TPAMI (2019).Google ScholarGoogle Scholar
  7. Grigorios G Chrysos, Epameinondas Antonakos, Patrick Snape, Akshay Asthana, and Stefanos Zafeiriou. 2018. A comprehensive performance evaluation of deformable face tracking "In-the-Wild". IJCV, Vol. 126, 2--4 (2018), 198--232.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Xiao Chu, Wei Yang, Wanli Ouyang, Cheng Ma, Alan L. Yuille, and Xiaogang Wang. 2017. Multi-context Attention for Human Pose Estimation. In CVPR. 5669--5678.Google ScholarGoogle Scholar
  9. Xuanyi Dong, Yan Yan, Wanli Ouyang, and Yi Yang. 2018a. Style Aggregated Network for Facial Landmark Detection. In CVPR. 379--388.Google ScholarGoogle Scholar
  10. Xuanyi Dong and Yi Yang. 2019. Teacher Supervises Students How to Learn From Partially Labeled Images for Facial Landmark Detection. In ICCV.Google ScholarGoogle Scholar
  11. Xuanyi Dong, Shoou-I Yu, Xinshuo Weng, Shih-En Wei, Yi Yang, and Yaser Sheikh. 2018b. Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors. In CVPR. 360--368.Google ScholarGoogle Scholar
  12. FGNET. 2014. Talking Face Video. http://www-prima.inrialpes.fr/FGnet/data/01-TalkingFace/talking_face.html.Google ScholarGoogle Scholar
  13. Y. Li, S. Wang, Y. Zhao, and Q. Ji. 2013. Simultaneous Facial Feature Tracking and Facial Expression Recognition. IEEE Transactions on Image Processing, Vol. 22, 7 (2013), 2559--2573.Google ScholarGoogle ScholarCross RefCross Ref
  14. Hao Liu, Jiwen Lu, Jianjiang Feng, and Jie Zhou. 2018. Two-stream transformer networks for video-based face alignment. TPAMI, Vol. 40, 11 (2018), 2546--2554.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Zhiwei Liu, Xiangyu Zhu, Guosheng Hu, Haiyun Guo, Ming Tang, Zhen Lei, Neil M. Robertson, and Jinqiao Wang. 2019. Semantic Alignment: Finding Semantically Consistent Ground-Truth for Facial Landmark Detection. In CVPR.Google ScholarGoogle Scholar
  16. Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. 2016. Hierarchical Question-Image Co-Attention for Visual Question Answering. In NIPS. 289--297.Google ScholarGoogle Scholar
  17. Xin Miao, Xiantong Zhen, Xianglong Liu, Cheng Deng, Vassilis Athitsos, and Heng Huang. 2018. Direct Shape Regression Networks for End-to-End Face Alignment. In CVPR. 5040--5049.Google ScholarGoogle Scholar
  18. Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked Hourglass Networks for Human Pose Estimation. In ECCV. 483--499.Google ScholarGoogle Scholar
  19. Aiden Nibali, Zhen He, Stuart Morgan, and Luke Prendergast. 2018. Numerical Coordinate Regression with Convolutional Neural Networks. CoRR, Vol. abs/1801.07372 (2018).Google ScholarGoogle Scholar
  20. Xi Peng, Rogerio S Feris, Xiaoyu Wang, and Dimitris N Metaxas. 2016. A recurrent encoder-decoder network for sequential face alignment. In ECCV. 38--56.Google ScholarGoogle Scholar
  21. Shengju Qian, Keqiang Sun, Wayne Wu, Chen Qian, and Jiaya Jia. 2019. Aggregation via Separation: Boosting Facial Landmark Detector With Semi-Supervised Style Translation. In ICCV.Google ScholarGoogle Scholar
  22. Shaoqing Ren, Xudong Cao, Yichen Wei, and Jian Sun. 2014. Face Alignment at 3000 FPS via Regressing Local Binary Features. In CVPR. 1685--1692.Google ScholarGoogle Scholar
  23. Joseph P. Robinson, Yuncheng Li, Ning Zhang, Yun Fu, and Sergey Tulyakov. 2019. Laplace Landmark Localization. In ICCV.Google ScholarGoogle Scholar
  24. Christos Sagonas, Epameinondas Antonakos, Georgios Tzimiropoulos, Stefanos Zafeiriou, and Maja Pantic. 2016. 300 faces in-the-wild challenge: Database and results. Image and vision computing, Vol. 47 (2016), 3--18.Google ScholarGoogle Scholar
  25. Jie Shen, Stefanos Zafeiriou, Grigoris G Chrysos, Jean Kossaifi, Georgios Tzimiropoulos, and Maja Pantic. 2015. The first facial landmark tracking in-the-wild challenge: Benchmark and results. In ICCV Workshops. 50--58.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In NIPS. 568--576.Google ScholarGoogle Scholar
  27. Keqiang Sun, Wayne Wu, Tinghao Liu, Shuo Yang, Quan Wang, Qiang Zhou, Zuochang Ye, and Chen Qian. 2019. FAB: A Robust Facial Landmark Detection Framework for Motion-Blurred Videos. In ICCV.Google ScholarGoogle Scholar
  28. Xiao Sun, Bin Xiao, Fangyin Wei, Shuang Liang, and Yichen Wei. 2018. Integral Human Pose Regression. In ECCV. 536--553.Google ScholarGoogle Scholar
  29. Yi Sun, Xiaogang Wang, and Xiaoou Tang. 2013. Deep Convolutional Network Cascade for Facial Point Detection. In CVPR. 3476--3483.Google ScholarGoogle Scholar
  30. Ying Tai, Yicong Liang, Xiaoming Liu, Lei Duan, Jilin Li, Chengjie Wang, Feiyue Huang, and Yu Chen. 2019. Towards highly accurate and stable face alignment for high-resolution videos. In AAAI. 8893--8900.Google ScholarGoogle Scholar
  31. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NIPS. 5998--6008.Google ScholarGoogle Scholar
  32. Wayne Wu, Chen Qian, Shuo Yang, Quan Wang, Yici Cai, and Qiang Zhou. 2018. Look at Boundary: A Boundary-Aware Face Alignment Algorithm. In CVPR. 2129--2138.Google ScholarGoogle Scholar
  33. Yue Wu and Qiang Ji. 2019. Facial Landmark Detection: A Literature Survey. IJCV, Vol. 127, 2 (2019), 115--142.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yue Wu, Ziheng Wang, and Qiang Ji. 2014. A Hierarchical Probabilistic Model for Facial Feature Detection. In CVPR. 1781--1788.Google ScholarGoogle Scholar
  35. Xuehan Xiong and Fernando De la Torre. 2013. Supervised descent method and its applications to face alignment. In CVPR. 532--539.Google ScholarGoogle Scholar
  36. Shi Yin, Shangfei Wang, Guozhu Peng, Xiaoping Chen, and Bowen Pan. 2019. Capturing Spatial and Temporal Patterns for Facial Landmark Tracking through Adversarial Learning. In IJCAI. 1010--1017.Google ScholarGoogle Scholar
  37. Feng Zhang, Xiatian Zhu, Hanbin Dai, Mao Ye, and Ce Zhu. 2020. Distribution-Aware Coordinate Representation for Human Pose Estimation. CVPR (2020).Google ScholarGoogle Scholar
  38. Zhanpeng Zhang, Ping Luo, Chen Change Loy, and Xiaoou Tang. 2016. Learning deep representation for face alignment with auxiliary attributes. TPAMI, Vol. 38, 5 (2016), 918--930.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Meilu Zhu, Daming Shi, Mingjie Zheng, and Muhammad Sadiq. 2019. Robust Facial Landmark Detection via Occlusion-Adaptive Deep Networks. In CVPR.Google ScholarGoogle Scholar
  40. Shizhan Zhu, Cheng Li, Chen Change Loy, and Xiaoou Tang. 2015. Face alignment by coarse-to-fine shape searching. In CVPR. 4998--5006.Google ScholarGoogle Scholar

Index Terms

  1. Attentive One-Dimensional Heatmap Regression for Facial Landmark Detection and Tracking

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MM '20: Proceedings of the 28th ACM International Conference on Multimedia
      October 2020
      4889 pages
      ISBN:9781450379885
      DOI:10.1145/3394171

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 October 2020

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate995of4,171submissions,24%

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader