research-article

Attentive One-Dimensional Heatmap Regression for Facial Landmark Detection and Tracking

Authors:
Shi Yin

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China
View Profile

,
Shangfei Wang

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China
View Profile

,
Xiaoping Chen

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China
View Profile

,
Enhong Chen

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China
View Profile

,
Cong Liang

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China
View Profile

MM '20: Proceedings of the 28th ACM International Conference on MultimediaOctober 2020Pages 538–546https://doi.org/10.1145/3394171.3413509

Published:12 October 2020Publication History

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

Pages 538–546

ABSTRACT

Although heatmap regression is considered a state-of-the-art method to locate facial landmarks, it suffers from huge spatial complexity and is prone to quantization error. To address this, we propose a novel attentive one-dimensional heatmap regression method for facial landmark localization. First, we predict two groups of 1D heatmaps to represent the marginal distributions of the x and y coordinates. These 1D heatmaps reduce spatial complexity significantly compared to current heatmap regression methods, which use 2D heatmaps to represent the joint distributions of x and y coordinates. With much lower spatial complexity, the proposed method can output high-resolution 1D heatmaps despite limited GPU memory, significantly alleviating the quantization error. Second, a co-attention mechanism is adopted to model the inherent spatial patterns existing in x and y coordinates, and therefore the joint distributions on the x and y axes are also captured. Third, based on the 1D heatmap structures, we propose a facial landmark detector capturing spatial patterns for landmark detection on an image; and a tracker further capturing temporal patterns with a temporal refinement mechanism for landmark tracking. Experimental results on four benchmark databases demonstrate the superiority of our method.

Supplemental Material

3394171.3413509.mp4

mp4

189.7 MB

Download

References

Akshay Asthana, Stefanos Zafeiriou, Shiyang Cheng, and Maja Pantic. 2014. Incremental Face Alignment in the Wild. In CVPR. 1859--1866.Google Scholar
Adrian Bulat and Georgios Tzimiropoulos. 2017. How far are we from solving the 2d & 3d face alignment problem?(and a dataset of 230,000 3d facial landmarks). In ICCV. 1021--1030.Google Scholar
Xavier P. Burgos-Artizzu, Pietro Perona, and Piotr Dollá r. 2013. Robust Face Landmark Estimation under Occlusion. In ICCV. 1513--1520.Google Scholar
Xudong Cao, Yichen Wei, Fang Wen, and Jian Sun. 2014. Face alignment by explicit shape regression. IJCV, Vol. 107, 2 (2014), 177--190.Google ScholarDigital Library
Lisha Chen, Hui Su, and Qiang Ji. 2019 b. Deep Structured Prediction for Facial Landmark Detection. In NeurIPS.Google Scholar
Y Chen, C Shen, H Chen, XS Wei, L Liu, and J Yang. 2019 a. Adversarial Learning of Structure-Aware Fully Convolutional Networks for Landmark Localization. TPAMI (2019).Google Scholar
Grigorios G Chrysos, Epameinondas Antonakos, Patrick Snape, Akshay Asthana, and Stefanos Zafeiriou. 2018. A comprehensive performance evaluation of deformable face tracking "In-the-Wild". IJCV, Vol. 126, 2--4 (2018), 198--232.Google ScholarDigital Library
Xiao Chu, Wei Yang, Wanli Ouyang, Cheng Ma, Alan L. Yuille, and Xiaogang Wang. 2017. Multi-context Attention for Human Pose Estimation. In CVPR. 5669--5678.Google Scholar
Xuanyi Dong, Yan Yan, Wanli Ouyang, and Yi Yang. 2018a. Style Aggregated Network for Facial Landmark Detection. In CVPR. 379--388.Google Scholar
Xuanyi Dong and Yi Yang. 2019. Teacher Supervises Students How to Learn From Partially Labeled Images for Facial Landmark Detection. In ICCV.Google Scholar
Xuanyi Dong, Shoou-I Yu, Xinshuo Weng, Shih-En Wei, Yi Yang, and Yaser Sheikh. 2018b. Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors. In CVPR. 360--368.Google Scholar
FGNET. 2014. Talking Face Video. http://www-prima.inrialpes.fr/FGnet/data/01-TalkingFace/talking_face.html.Google Scholar
Y. Li, S. Wang, Y. Zhao, and Q. Ji. 2013. Simultaneous Facial Feature Tracking and Facial Expression Recognition. IEEE Transactions on Image Processing, Vol. 22, 7 (2013), 2559--2573.Google ScholarCross Ref
Hao Liu, Jiwen Lu, Jianjiang Feng, and Jie Zhou. 2018. Two-stream transformer networks for video-based face alignment. TPAMI, Vol. 40, 11 (2018), 2546--2554.Google ScholarDigital Library
Zhiwei Liu, Xiangyu Zhu, Guosheng Hu, Haiyun Guo, Ming Tang, Zhen Lei, Neil M. Robertson, and Jinqiao Wang. 2019. Semantic Alignment: Finding Semantically Consistent Ground-Truth for Facial Landmark Detection. In CVPR.Google Scholar
Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. 2016. Hierarchical Question-Image Co-Attention for Visual Question Answering. In NIPS. 289--297.Google Scholar
Xin Miao, Xiantong Zhen, Xianglong Liu, Cheng Deng, Vassilis Athitsos, and Heng Huang. 2018. Direct Shape Regression Networks for End-to-End Face Alignment. In CVPR. 5040--5049.Google Scholar
Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked Hourglass Networks for Human Pose Estimation. In ECCV. 483--499.Google Scholar
Aiden Nibali, Zhen He, Stuart Morgan, and Luke Prendergast. 2018. Numerical Coordinate Regression with Convolutional Neural Networks. CoRR, Vol. abs/1801.07372 (2018).Google Scholar
Xi Peng, Rogerio S Feris, Xiaoyu Wang, and Dimitris N Metaxas. 2016. A recurrent encoder-decoder network for sequential face alignment. In ECCV. 38--56.Google Scholar
Shengju Qian, Keqiang Sun, Wayne Wu, Chen Qian, and Jiaya Jia. 2019. Aggregation via Separation: Boosting Facial Landmark Detector With Semi-Supervised Style Translation. In ICCV.Google Scholar
Shaoqing Ren, Xudong Cao, Yichen Wei, and Jian Sun. 2014. Face Alignment at 3000 FPS via Regressing Local Binary Features. In CVPR. 1685--1692.Google Scholar
Joseph P. Robinson, Yuncheng Li, Ning Zhang, Yun Fu, and Sergey Tulyakov. 2019. Laplace Landmark Localization. In ICCV.Google Scholar
Christos Sagonas, Epameinondas Antonakos, Georgios Tzimiropoulos, Stefanos Zafeiriou, and Maja Pantic. 2016. 300 faces in-the-wild challenge: Database and results. Image and vision computing, Vol. 47 (2016), 3--18.Google Scholar
Jie Shen, Stefanos Zafeiriou, Grigoris G Chrysos, Jean Kossaifi, Georgios Tzimiropoulos, and Maja Pantic. 2015. The first facial landmark tracking in-the-wild challenge: Benchmark and results. In ICCV Workshops. 50--58.Google ScholarDigital Library
Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In NIPS. 568--576.Google Scholar
Keqiang Sun, Wayne Wu, Tinghao Liu, Shuo Yang, Quan Wang, Qiang Zhou, Zuochang Ye, and Chen Qian. 2019. FAB: A Robust Facial Landmark Detection Framework for Motion-Blurred Videos. In ICCV.Google Scholar
Xiao Sun, Bin Xiao, Fangyin Wei, Shuang Liang, and Yichen Wei. 2018. Integral Human Pose Regression. In ECCV. 536--553.Google Scholar
Yi Sun, Xiaogang Wang, and Xiaoou Tang. 2013. Deep Convolutional Network Cascade for Facial Point Detection. In CVPR. 3476--3483.Google Scholar
Ying Tai, Yicong Liang, Xiaoming Liu, Lei Duan, Jilin Li, Chengjie Wang, Feiyue Huang, and Yu Chen. 2019. Towards highly accurate and stable face alignment for high-resolution videos. In AAAI. 8893--8900.Google Scholar
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NIPS. 5998--6008.Google Scholar
Wayne Wu, Chen Qian, Shuo Yang, Quan Wang, Yici Cai, and Qiang Zhou. 2018. Look at Boundary: A Boundary-Aware Face Alignment Algorithm. In CVPR. 2129--2138.Google Scholar
Yue Wu and Qiang Ji. 2019. Facial Landmark Detection: A Literature Survey. IJCV, Vol. 127, 2 (2019), 115--142.Google ScholarDigital Library
Yue Wu, Ziheng Wang, and Qiang Ji. 2014. A Hierarchical Probabilistic Model for Facial Feature Detection. In CVPR. 1781--1788.Google Scholar
Xuehan Xiong and Fernando De la Torre. 2013. Supervised descent method and its applications to face alignment. In CVPR. 532--539.Google Scholar
Shi Yin, Shangfei Wang, Guozhu Peng, Xiaoping Chen, and Bowen Pan. 2019. Capturing Spatial and Temporal Patterns for Facial Landmark Tracking through Adversarial Learning. In IJCAI. 1010--1017.Google Scholar
Feng Zhang, Xiatian Zhu, Hanbin Dai, Mao Ye, and Ce Zhu. 2020. Distribution-Aware Coordinate Representation for Human Pose Estimation. CVPR (2020).Google Scholar
Zhanpeng Zhang, Ping Luo, Chen Change Loy, and Xiaoou Tang. 2016. Learning deep representation for face alignment with auxiliary attributes. TPAMI, Vol. 38, 5 (2016), 918--930.Google ScholarDigital Library
Meilu Zhu, Daming Shi, Mingjie Zheng, and Muhammad Sadiq. 2019. Robust Facial Landmark Detection via Occlusion-Adaptive Deep Networks. In CVPR.Google Scholar
Shizhan Zhu, Cheng Li, Chen Change Loy, and Xiaoou Tang. 2015. Face alignment by coarse-to-fine shape searching. In CVPR. 4998--5006.Google Scholar

Index Terms

Attentive One-Dimensional Heatmap Regression for Facial Landmark Detection and Tracking
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Biometrics

Recommendations

Emotion Recognition with Facial Landmark Heatmaps
MultiMedia Modeling
Abstract
Facial expression recognition is a very challenging problem and has attracted more and more researchers’ attention. In this paper, considering that facial expression recognition is closely related to the features of key facial regions, we propose ...
Read More
2D Wasserstein loss for robust facial landmark detection
Highlights
- Rethink the problem of robust facial landmark detection between the reaserch and the practical use.
Abstract
The recent performance of facial landmark detection has been significantly improved by using deep Convolutional Neural Networks (CNNs), especially the Heatmap Regression Models (HRMs). Although their performance on common benchmark ...
Read More
Spatial Gradient Guided Learning and Semantic Relation Transfer for Facial Landmark Detection
MultiMedia Modeling
Abstract
Pixel-wise losses are widely used in heatmap regression networks to detect facial landmarks, however, those losses are not consistent with the evaluation criteria in testing, which is evaluating the error between the highest pixel position in the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '20: Proceedings of the 28th ACM International Conference on Multimedia
October 2020
4889 pages
ISBN:9781450379885
DOI:10.1145/3394171
General Chairs:
Chang Wen Chen
Chinese University of Hong Kong, Shenzhen, China
,
Rita Cucchiara
UNIMORE, Italy
,
Xian-Sheng Hua
Alibaba Group, China
,
Program Chairs:
Guo-Jun Qi
Futurewei Technologies, USA
,
Elisa Ricci
UNITN & Fondazione Bruno Kessler, Italy
,
Zhengyou Zhang
Tencent, China
,
Roger Zimmermann
National University of Singapore, Singapore
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 October 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
facial landmark detection
facial landmark tracking
heatmap regression
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 255
  Total Downloads
- Downloads (Last 12 months)28
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Attentive One-Dimensional Heatmap Regression for Facial Landmark Detection and Tracking

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Emotion Recognition with Facial Landmark Heatmaps

2D Wasserstein loss for robust facial landmark detection

Spatial Gradient Guided Learning and Semantic Relation Transfer for Facial Landmark Detection