ABSTRACT
Although heatmap regression is considered a state-of-the-art method to locate facial landmarks, it suffers from huge spatial complexity and is prone to quantization error. To address this, we propose a novel attentive one-dimensional heatmap regression method for facial landmark localization. First, we predict two groups of 1D heatmaps to represent the marginal distributions of the x and y coordinates. These 1D heatmaps reduce spatial complexity significantly compared to current heatmap regression methods, which use 2D heatmaps to represent the joint distributions of x and y coordinates. With much lower spatial complexity, the proposed method can output high-resolution 1D heatmaps despite limited GPU memory, significantly alleviating the quantization error. Second, a co-attention mechanism is adopted to model the inherent spatial patterns existing in x and y coordinates, and therefore the joint distributions on the x and y axes are also captured. Third, based on the 1D heatmap structures, we propose a facial landmark detector capturing spatial patterns for landmark detection on an image; and a tracker further capturing temporal patterns with a temporal refinement mechanism for landmark tracking. Experimental results on four benchmark databases demonstrate the superiority of our method.
Supplemental Material
- Akshay Asthana, Stefanos Zafeiriou, Shiyang Cheng, and Maja Pantic. 2014. Incremental Face Alignment in the Wild. In CVPR. 1859--1866.Google Scholar
- Adrian Bulat and Georgios Tzimiropoulos. 2017. How far are we from solving the 2d & 3d face alignment problem?(and a dataset of 230,000 3d facial landmarks). In ICCV. 1021--1030.Google Scholar
- Xavier P. Burgos-Artizzu, Pietro Perona, and Piotr Dollá r. 2013. Robust Face Landmark Estimation under Occlusion. In ICCV. 1513--1520.Google Scholar
- Xudong Cao, Yichen Wei, Fang Wen, and Jian Sun. 2014. Face alignment by explicit shape regression. IJCV, Vol. 107, 2 (2014), 177--190.Google ScholarDigital Library
- Lisha Chen, Hui Su, and Qiang Ji. 2019 b. Deep Structured Prediction for Facial Landmark Detection. In NeurIPS.Google Scholar
- Y Chen, C Shen, H Chen, XS Wei, L Liu, and J Yang. 2019 a. Adversarial Learning of Structure-Aware Fully Convolutional Networks for Landmark Localization. TPAMI (2019).Google Scholar
- Grigorios G Chrysos, Epameinondas Antonakos, Patrick Snape, Akshay Asthana, and Stefanos Zafeiriou. 2018. A comprehensive performance evaluation of deformable face tracking "In-the-Wild". IJCV, Vol. 126, 2--4 (2018), 198--232.Google ScholarDigital Library
- Xiao Chu, Wei Yang, Wanli Ouyang, Cheng Ma, Alan L. Yuille, and Xiaogang Wang. 2017. Multi-context Attention for Human Pose Estimation. In CVPR. 5669--5678.Google Scholar
- Xuanyi Dong, Yan Yan, Wanli Ouyang, and Yi Yang. 2018a. Style Aggregated Network for Facial Landmark Detection. In CVPR. 379--388.Google Scholar
- Xuanyi Dong and Yi Yang. 2019. Teacher Supervises Students How to Learn From Partially Labeled Images for Facial Landmark Detection. In ICCV.Google Scholar
- Xuanyi Dong, Shoou-I Yu, Xinshuo Weng, Shih-En Wei, Yi Yang, and Yaser Sheikh. 2018b. Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors. In CVPR. 360--368.Google Scholar
- FGNET. 2014. Talking Face Video. http://www-prima.inrialpes.fr/FGnet/data/01-TalkingFace/talking_face.html.Google Scholar
- Y. Li, S. Wang, Y. Zhao, and Q. Ji. 2013. Simultaneous Facial Feature Tracking and Facial Expression Recognition. IEEE Transactions on Image Processing, Vol. 22, 7 (2013), 2559--2573.Google ScholarCross Ref
- Hao Liu, Jiwen Lu, Jianjiang Feng, and Jie Zhou. 2018. Two-stream transformer networks for video-based face alignment. TPAMI, Vol. 40, 11 (2018), 2546--2554.Google ScholarDigital Library
- Zhiwei Liu, Xiangyu Zhu, Guosheng Hu, Haiyun Guo, Ming Tang, Zhen Lei, Neil M. Robertson, and Jinqiao Wang. 2019. Semantic Alignment: Finding Semantically Consistent Ground-Truth for Facial Landmark Detection. In CVPR.Google Scholar
- Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. 2016. Hierarchical Question-Image Co-Attention for Visual Question Answering. In NIPS. 289--297.Google Scholar
- Xin Miao, Xiantong Zhen, Xianglong Liu, Cheng Deng, Vassilis Athitsos, and Heng Huang. 2018. Direct Shape Regression Networks for End-to-End Face Alignment. In CVPR. 5040--5049.Google Scholar
- Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked Hourglass Networks for Human Pose Estimation. In ECCV. 483--499.Google Scholar
- Aiden Nibali, Zhen He, Stuart Morgan, and Luke Prendergast. 2018. Numerical Coordinate Regression with Convolutional Neural Networks. CoRR, Vol. abs/1801.07372 (2018).Google Scholar
- Xi Peng, Rogerio S Feris, Xiaoyu Wang, and Dimitris N Metaxas. 2016. A recurrent encoder-decoder network for sequential face alignment. In ECCV. 38--56.Google Scholar
- Shengju Qian, Keqiang Sun, Wayne Wu, Chen Qian, and Jiaya Jia. 2019. Aggregation via Separation: Boosting Facial Landmark Detector With Semi-Supervised Style Translation. In ICCV.Google Scholar
- Shaoqing Ren, Xudong Cao, Yichen Wei, and Jian Sun. 2014. Face Alignment at 3000 FPS via Regressing Local Binary Features. In CVPR. 1685--1692.Google Scholar
- Joseph P. Robinson, Yuncheng Li, Ning Zhang, Yun Fu, and Sergey Tulyakov. 2019. Laplace Landmark Localization. In ICCV.Google Scholar
- Christos Sagonas, Epameinondas Antonakos, Georgios Tzimiropoulos, Stefanos Zafeiriou, and Maja Pantic. 2016. 300 faces in-the-wild challenge: Database and results. Image and vision computing, Vol. 47 (2016), 3--18.Google Scholar
- Jie Shen, Stefanos Zafeiriou, Grigoris G Chrysos, Jean Kossaifi, Georgios Tzimiropoulos, and Maja Pantic. 2015. The first facial landmark tracking in-the-wild challenge: Benchmark and results. In ICCV Workshops. 50--58.Google ScholarDigital Library
- Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In NIPS. 568--576.Google Scholar
- Keqiang Sun, Wayne Wu, Tinghao Liu, Shuo Yang, Quan Wang, Qiang Zhou, Zuochang Ye, and Chen Qian. 2019. FAB: A Robust Facial Landmark Detection Framework for Motion-Blurred Videos. In ICCV.Google Scholar
- Xiao Sun, Bin Xiao, Fangyin Wei, Shuang Liang, and Yichen Wei. 2018. Integral Human Pose Regression. In ECCV. 536--553.Google Scholar
- Yi Sun, Xiaogang Wang, and Xiaoou Tang. 2013. Deep Convolutional Network Cascade for Facial Point Detection. In CVPR. 3476--3483.Google Scholar
- Ying Tai, Yicong Liang, Xiaoming Liu, Lei Duan, Jilin Li, Chengjie Wang, Feiyue Huang, and Yu Chen. 2019. Towards highly accurate and stable face alignment for high-resolution videos. In AAAI. 8893--8900.Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NIPS. 5998--6008.Google Scholar
- Wayne Wu, Chen Qian, Shuo Yang, Quan Wang, Yici Cai, and Qiang Zhou. 2018. Look at Boundary: A Boundary-Aware Face Alignment Algorithm. In CVPR. 2129--2138.Google Scholar
- Yue Wu and Qiang Ji. 2019. Facial Landmark Detection: A Literature Survey. IJCV, Vol. 127, 2 (2019), 115--142.Google ScholarDigital Library
- Yue Wu, Ziheng Wang, and Qiang Ji. 2014. A Hierarchical Probabilistic Model for Facial Feature Detection. In CVPR. 1781--1788.Google Scholar
- Xuehan Xiong and Fernando De la Torre. 2013. Supervised descent method and its applications to face alignment. In CVPR. 532--539.Google Scholar
- Shi Yin, Shangfei Wang, Guozhu Peng, Xiaoping Chen, and Bowen Pan. 2019. Capturing Spatial and Temporal Patterns for Facial Landmark Tracking through Adversarial Learning. In IJCAI. 1010--1017.Google Scholar
- Feng Zhang, Xiatian Zhu, Hanbin Dai, Mao Ye, and Ce Zhu. 2020. Distribution-Aware Coordinate Representation for Human Pose Estimation. CVPR (2020).Google Scholar
- Zhanpeng Zhang, Ping Luo, Chen Change Loy, and Xiaoou Tang. 2016. Learning deep representation for face alignment with auxiliary attributes. TPAMI, Vol. 38, 5 (2016), 918--930.Google ScholarDigital Library
- Meilu Zhu, Daming Shi, Mingjie Zheng, and Muhammad Sadiq. 2019. Robust Facial Landmark Detection via Occlusion-Adaptive Deep Networks. In CVPR.Google Scholar
- Shizhan Zhu, Cheng Li, Chen Change Loy, and Xiaoou Tang. 2015. Face alignment by coarse-to-fine shape searching. In CVPR. 4998--5006.Google Scholar
Index Terms
- Attentive One-Dimensional Heatmap Regression for Facial Landmark Detection and Tracking
Recommendations
Emotion Recognition with Facial Landmark Heatmaps
MultiMedia ModelingAbstractFacial expression recognition is a very challenging problem and has attracted more and more researchers’ attention. In this paper, considering that facial expression recognition is closely related to the features of key facial regions, we propose ...
2D Wasserstein loss for robust facial landmark detection
Highlights- Rethink the problem of robust facial landmark detection between the reaserch and the practical use.
AbstractThe recent performance of facial landmark detection has been significantly improved by using deep Convolutional Neural Networks (CNNs), especially the Heatmap Regression Models (HRMs). Although their performance on common benchmark ...
Spatial Gradient Guided Learning and Semantic Relation Transfer for Facial Landmark Detection
MultiMedia ModelingAbstractPixel-wise losses are widely used in heatmap regression networks to detect facial landmarks, however, those losses are not consistent with the evaluation criteria in testing, which is evaluating the error between the highest pixel position in the ...
Comments