RGB-D-based gaze point estimation via multi-column CNNs and facial landmarks global optimization

Zhang, Ziheng; Lian, Dongze; Gao, Shenghua

doi:10.1007/s00371-020-01934-1

RGB-D-based gaze point estimation via multi-column CNNs and facial landmarks global optimization

Original article
Published: 30 October 2020

Volume 37, pages 1731–1741, (2021)
Cite this article

The Visual Computer Aims and scope Submit manuscript

588 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

In this work, we utilize a multi-column CNNs framework to estimate the gaze point of a person sitting in front of a display from an RGB-D image of the person. Given that gaze points are determined by head poses, eyeball poses, and 3D eye positions, we propose to infer the three components separately and then integrate them for gaze point estimation. The captured depth images, however, usually contain noises and black holes which prevent us from acquiring reliable head pose and 3D eye position estimation. Therefore, we propose to refine the raw depth for 68 facial keypoints by first estimating their relative depths from RGB face images, which along with the captured raw depths are then used to solve the absolute depth for all facial keypoints through global optimization. The refined depths will provide us reliable estimation for both head pose and 3D eye position. Given that existing publicly available RGB-D gaze tracking datasets are small, we also build a new dataset for training and validating our method. To the best of our knowledge, it is the largest RGB-D gaze tracking dataset in terms of the number of participants. Comprehensive experiments demonstrate that our method outperforms existing methods by a large margin on both our dataset and the Eyediap dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High-fidelity facial expression transfer using part-based local–global conditional gans

Article 26 July 2023

Assessing Facial Symmetry and Attractiveness using Augmented Reality

Article Open access 28 March 2021

An overview of hand-eye calibration

Article 19 November 2021

Notes

Gaze estimation is also referred to as eye-gaze estimation, eye tracking, and gaze tracking in some literature.
The eyeball pose has also been denoted as the eyeball movement in [41].
In fact, most learning-based methods, including ours, can be person-specific if they are trained or fine-tuned with samples from each specific person.
The depth of the eye is calculated by taking the median of depth values within a square region in the depth image.
http://dlib.net/.
https://pytorch.org/.
Only the landmarks having nonzero depths were used. For simplicity, we assumed that all of the elements of 3D facial landmark vectors are independent random variables.

References

Barron, J.T., Malik, J.: Intrinsic scene properties from a single RGB-D image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 17–24 (2013)
Cireşan, D., Meier, U., Schmidhuber, J.: Multi-column Deep Neural Networks for Image Classification (2012). arXiv:1202.2745
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Advances in Neural Information Processing Systems, pp. 2366–2374 (2014)
Funes-Mora, K.A., Odobez, J.M.: Gaze estimation in the 3d space using RGB-D sensors. Int. J. Comput. Vision 118(2), 194–216 (2016)
Article MathSciNet Google Scholar
Ghiass, R.S., Arandjelovic, O.: Highly accurate gaze estimation using a consumer RGB-D sensor. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, pp. 3368–3374. AAAI Press (2016)
Hansen, D.W., Ji, Q.: In the eye of the beholder: a survey of models for eyes and gaze. IEEE Trans. Pattern Anal. Mach. Intell. 32(3), 478–500 (2010)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
He, Q., Hong, X., Chai, X., Holappa, J., Zhao, G., Chen, X., Pietikäinen, M.: Omeg: oulu multi-pose eye gaze dataset. In: Scandinavian Conference on Image Analysis, pp. 418–427. Springer (2015)
Huang, Q., Veeraraghavan, A., Sabharwal, A.: Tabletgaze: Unconstrained Appearance-based Gaze Estimation in Mobile Tablets (2015). arXiv:1508.01244
Kar, A., Corcoran, P.: A review and analysis of eye-gaze estimation systems, algorithms and performance evaluation methods in consumer platforms. IEEE Access 5, 16495–16519 (2017)
Article Google Scholar
Krafka, K., Khosla, A., Kellnhofer, P., Kannan, H., Bhandarkar, S., Matusik, W., Torralba, A.: Eye tracking for everyone. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2176–2184 (2016)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Kuno, Y., Yagi, T., Uchikawa, Y.: Development of a fish-eye VR system with human visual functioning and biological signals. In: 1996 IEEE/SICE/RSJ International Conference on Multisensor Fusion and Integration for Intelligent Systems (Cat. No. 96TH8242), pp. 389–394. IEEE (1996)
Kuno, Y., Yagi, T., Uchikawa, Y.: Development of eye-gaze input interface. In: Proceedings of 7th International Conference on Human Computer Interaction. Volumen 1, vol. 44 (1997)
Liu, G., Yu, Y., Funes-Mora, K.A., Odobez, J.M.: A differential approach for gaze estimation with calibration. In: 29th British Machine Vision Conference 2018 (2018)
Liu, G., Yu, Y., Mora, K.A.F., Odobez, J.M.: A differential approach for gaze estimation (2019). arXiv:1904.09459
Majaranta, P., Bulling, A.: Eye tracking and eye-based human–computer interaction. In: Advances in Physiological Computing, pp. 39–65. Springer (2014)
Masko, D.: Calibration in eye tracking using transfer learning. Master thesis, KTH, School of Computer Science and Communication (CSC) (2017)
McMurrough, C.D., Metsis, V., Rich, J., Makedon, F.: An eye tracking dataset for point of gaze detection. In: Proceedings of the Symposium on Eye Tracking Research and Applications, ETRA’12, pp. 305–308. ACM, New York, NY, USA (2012). https://doi.org/10.1145/2168556.2168622
Mora, K.A.F., Monay, F., Odobez, J.M.: Eyediap: a database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras. In: Proceedings of the Symposium on Eye Tracking Research and Applications, pp. 255–258. ACM (2014)
Mora, K.A.F., Odobez, J.M.: Gaze estimation from multimodal kinect data. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 25–30. IEEE (2012)
Morimoto, C.H., Mimica, M.R.: Eye gaze tracking techniques for interactive applications. Comput. Vis. Image Underst. 98(1), 4–24 (2005)
Article Google Scholar
Ranjan, R., De Mello, S., Kautz, J.: Light-weight head pose invariant gaze tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 2156–2164 (2018)
Rayner, K.: Eye movements in reading and information processing: 20 years of research. Psychol. Bull. 124(3), 372 (1998)
Article Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Sagonas, C., Antonakos, E., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge: database and results. Image Vis. Comput. 47, 3–18 (2016)
Article Google Scholar
Shoja Ghiass, R., Arandjelovć, O., Laurendeau, D.: Highly accurate and fully automatic 3d head pose estimation and eye gaze estimation using rgb-d sensors and 3d morphable models. Sensors 18(12), 4280 (2018)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
Stellato, B., Banjac, G., Goulart, P., Bemporad, A., Boyd, S.: OSQP: an operator splitting solver for quadratic programs. Math. Program. Comput. 12, 637–672 (2020). https://doi.org/10.1007/s12532-020-00179-2
Stiefelhagen, R., Yang, J., Waibel, A.: A model-based gaze tracking system. Int. J. Artif. Intell. Tools 6(02), 193–209 (1997)
Article Google Scholar
Sugano, Y., Matsushita, Y., Sato, Y.: Learning-by-synthesis for appearance-based 3d gaze estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1821–1828 (2014)
Suwajanakorn, S., Hernandez, C., Seitz, S.M.: Depth from focus with your mobile phone. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3497–3506 (2015)
Wang, K., Zhao, R., Su, H., Ji, Q.: Generalizing eye tracking with Bayesian adversarial learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11907–11916 (2019)
Wang, Y., Yuan, G., Mi, Z., Peng, J., Ding, X., Liang, Z., Fu, X.: Continuous driver’s gaze zone estimation using RGB-D camera. Sensors 19(6), 1287 (2019)
Article Google Scholar
Xie, J., Girshick, R., Farhadi, A.: Deep3d: Fully automatic 2d-to-3d video conversion with deep convolutional neural networks. In: European Conference on Computer Vision, pp. 842–857. Springer (2016)
Xiong, X., Liu, Z., Cai, Q., Zhang, Z.: Eye gaze tracking using an RGBD camera: a comparison with a RGB solution. In: Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, pp. 1113–1121. ACM (2014)
Zhang, R., Tsai, P.S., Cryer, J.E., Shah, M.: Shape-from-shading: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 21(8), 690–706 (1999)
Article Google Scholar
Zhang, X., Sugano, Y., Fritz, M., Bulling, A.: Appearance-based gaze estimation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4511–4520 (2015)
Zhang, X., Sugano, Y., Fritz, M., Bulling, A.: It’s written all over your face: full-face appearance-based gaze estimation. In: CVPRW (2017)
Zhang, Y., Funkhouser, T.: Deep depth completion of a single RGB-D image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 175–185 (2018)
Zhu, W., Deng, H.: Monocular free-head 3d gaze tracking with deep learning and geometry constraints. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3143–3152 (2017)
Zhu, Z., Ji, Q., Bennett, K.P.: Nonlinear eye gaze mapping function estimation via support vector regression. In: 18th International Conference on Pattern Recognition, 2006. ICPR 2006, vol. 1, pp. 1132–1135. IEEE (2006)

Download references

Author information

Authors and Affiliations

ShanghaiTech University, Shanghai, China
Ziheng Zhang, Dongze Lian & Shenghua Gao
Shanghai Institute of Microsystem and Information Technology, University of Chinese Academy of Sciences, Shanghai, China
Ziheng Zhang, Dongze Lian & Shenghua Gao

Authors

Ziheng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Dongze Lian
View author publications
You can also search for this author in PubMed Google Scholar
Shenghua Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ziheng Zhang.

Ethics declarations

Conflict of interest

The author declares that they have no conflict of interest.

Ethical approval

Ziheng Zhang declares that all procedures performed in studies involving human participants were in accordance with the ethical standards of the ShanghaiTech Ethics Committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards, and that no study with animals was performed by any of the authors. Ziheng Zhang declares that informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Z., Lian, D. & Gao, S. RGB-D-based gaze point estimation via multi-column CNNs and facial landmarks global optimization. Vis Comput 37, 1731–1741 (2021). https://doi.org/10.1007/s00371-020-01934-1

Download citation

Published: 30 October 2020
Issue Date: July 2021
DOI: https://doi.org/10.1007/s00371-020-01934-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RGB-D-based gaze point estimation via multi-column CNNs and facial landmarks global optimization

Abstract

Access this article

Similar content being viewed by others

High-fidelity facial expression transfer using part-based local–global conditional gans

Assessing Facial Symmetry and Attractiveness using Augmented Reality

An overview of hand-eye calibration

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

RGB-D-based gaze point estimation via multi-column CNNs and facial landmarks global optimization

Abstract

Access this article

Similar content being viewed by others

High-fidelity facial expression transfer using part-based local–global conditional gans

Assessing Facial Symmetry and Attractiveness using Augmented Reality

An overview of hand-eye calibration

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation