Skip to main content
Log in

RGB-D-based gaze point estimation via multi-column CNNs and facial landmarks global optimization

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

In this work, we utilize a multi-column CNNs framework to estimate the gaze point of a person sitting in front of a display from an RGB-D image of the person. Given that gaze points are determined by head poses, eyeball poses, and 3D eye positions, we propose to infer the three components separately and then integrate them for gaze point estimation. The captured depth images, however, usually contain noises and black holes which prevent us from acquiring reliable head pose and 3D eye position estimation. Therefore, we propose to refine the raw depth for 68 facial keypoints by first estimating their relative depths from RGB face images, which along with the captured raw depths are then used to solve the absolute depth for all facial keypoints through global optimization. The refined depths will provide us reliable estimation for both head pose and 3D eye position. Given that existing publicly available RGB-D gaze tracking datasets are small, we also build a new dataset for training and validating our method. To the best of our knowledge, it is the largest RGB-D gaze tracking dataset in terms of the number of participants. Comprehensive experiments demonstrate that our method outperforms existing methods by a large margin on both our dataset and the Eyediap dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. Gaze estimation is also referred to as eye-gaze estimation, eye tracking, and gaze tracking in some literature.

  2. The eyeball pose has also been denoted as the eyeball movement in [41].

  3. In fact, most learning-based methods, including ours, can be person-specific if they are trained or fine-tuned with samples from each specific person.

  4. The depth of the eye is calculated by taking the median of depth values within a square region in the depth image.

  5. http://dlib.net/.

  6. https://pytorch.org/.

  7. Only the landmarks having nonzero depths were used. For simplicity, we assumed that all of the elements of 3D facial landmark vectors are independent random variables.

References

  1. Barron, J.T., Malik, J.: Intrinsic scene properties from a single RGB-D image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 17–24 (2013)

  2. Cireşan, D., Meier, U., Schmidhuber, J.: Multi-column Deep Neural Networks for Image Classification (2012). arXiv:1202.2745

  3. Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Advances in Neural Information Processing Systems, pp. 2366–2374 (2014)

  4. Funes-Mora, K.A., Odobez, J.M.: Gaze estimation in the 3d space using RGB-D sensors. Int. J. Comput. Vision 118(2), 194–216 (2016)

    Article  MathSciNet  Google Scholar 

  5. Ghiass, R.S., Arandjelovic, O.: Highly accurate gaze estimation using a consumer RGB-D sensor. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, pp. 3368–3374. AAAI Press (2016)

  6. Hansen, D.W., Ji, Q.: In the eye of the beholder: a survey of models for eyes and gaze. IEEE Trans. Pattern Anal. Mach. Intell. 32(3), 478–500 (2010)

    Article  Google Scholar 

  7. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  8. He, Q., Hong, X., Chai, X., Holappa, J., Zhao, G., Chen, X., Pietikäinen, M.: Omeg: oulu multi-pose eye gaze dataset. In: Scandinavian Conference on Image Analysis, pp. 418–427. Springer (2015)

  9. Huang, Q., Veeraraghavan, A., Sabharwal, A.: Tabletgaze: Unconstrained Appearance-based Gaze Estimation in Mobile Tablets (2015). arXiv:1508.01244

  10. Kar, A., Corcoran, P.: A review and analysis of eye-gaze estimation systems, algorithms and performance evaluation methods in consumer platforms. IEEE Access 5, 16495–16519 (2017)

    Article  Google Scholar 

  11. Krafka, K., Khosla, A., Kellnhofer, P., Kannan, H., Bhandarkar, S., Matusik, W., Torralba, A.: Eye tracking for everyone. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2176–2184 (2016)

  12. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

  13. Kuno, Y., Yagi, T., Uchikawa, Y.: Development of a fish-eye VR system with human visual functioning and biological signals. In: 1996 IEEE/SICE/RSJ International Conference on Multisensor Fusion and Integration for Intelligent Systems (Cat. No. 96TH8242), pp. 389–394. IEEE (1996)

  14. Kuno, Y., Yagi, T., Uchikawa, Y.: Development of eye-gaze input interface. In: Proceedings of 7th International Conference on Human Computer Interaction. Volumen 1, vol. 44 (1997)

  15. Liu, G., Yu, Y., Funes-Mora, K.A., Odobez, J.M.: A differential approach for gaze estimation with calibration. In: 29th British Machine Vision Conference 2018 (2018)

  16. Liu, G., Yu, Y., Mora, K.A.F., Odobez, J.M.: A differential approach for gaze estimation (2019). arXiv:1904.09459

  17. Majaranta, P., Bulling, A.: Eye tracking and eye-based human–computer interaction. In: Advances in Physiological Computing, pp. 39–65. Springer (2014)

  18. Masko, D.: Calibration in eye tracking using transfer learning. Master thesis, KTH, School of Computer Science and Communication (CSC) (2017)

  19. McMurrough, C.D., Metsis, V., Rich, J., Makedon, F.: An eye tracking dataset for point of gaze detection. In: Proceedings of the Symposium on Eye Tracking Research and Applications, ETRA’12, pp. 305–308. ACM, New York, NY, USA (2012). https://doi.org/10.1145/2168556.2168622

  20. Mora, K.A.F., Monay, F., Odobez, J.M.: Eyediap: a database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras. In: Proceedings of the Symposium on Eye Tracking Research and Applications, pp. 255–258. ACM (2014)

  21. Mora, K.A.F., Odobez, J.M.: Gaze estimation from multimodal kinect data. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 25–30. IEEE (2012)

  22. Morimoto, C.H., Mimica, M.R.: Eye gaze tracking techniques for interactive applications. Comput. Vis. Image Underst. 98(1), 4–24 (2005)

    Article  Google Scholar 

  23. Ranjan, R., De Mello, S., Kautz, J.: Light-weight head pose invariant gaze tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 2156–2164 (2018)

  24. Rayner, K.: Eye movements in reading and information processing: 20 years of research. Psychol. Bull. 124(3), 372 (1998)

    Article  Google Scholar 

  25. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)

  26. Sagonas, C., Antonakos, E., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge: database and results. Image Vis. Comput. 47, 3–18 (2016)

    Article  Google Scholar 

  27. Shoja Ghiass, R., Arandjelovć, O., Laurendeau, D.: Highly accurate and fully automatic 3d head pose estimation and eye gaze estimation using rgb-d sensors and 3d morphable models. Sensors 18(12), 4280 (2018)

    Article  Google Scholar 

  28. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)

  29. Stellato, B., Banjac, G., Goulart, P., Bemporad, A., Boyd, S.: OSQP: an operator splitting solver for quadratic programs. Math. Program. Comput. 12, 637–672 (2020). https://doi.org/10.1007/s12532-020-00179-2

  30. Stiefelhagen, R., Yang, J., Waibel, A.: A model-based gaze tracking system. Int. J. Artif. Intell. Tools 6(02), 193–209 (1997)

    Article  Google Scholar 

  31. Sugano, Y., Matsushita, Y., Sato, Y.: Learning-by-synthesis for appearance-based 3d gaze estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1821–1828 (2014)

  32. Suwajanakorn, S., Hernandez, C., Seitz, S.M.: Depth from focus with your mobile phone. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3497–3506 (2015)

  33. Wang, K., Zhao, R., Su, H., Ji, Q.: Generalizing eye tracking with Bayesian adversarial learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11907–11916 (2019)

  34. Wang, Y., Yuan, G., Mi, Z., Peng, J., Ding, X., Liang, Z., Fu, X.: Continuous driver’s gaze zone estimation using RGB-D camera. Sensors 19(6), 1287 (2019)

    Article  Google Scholar 

  35. Xie, J., Girshick, R., Farhadi, A.: Deep3d: Fully automatic 2d-to-3d video conversion with deep convolutional neural networks. In: European Conference on Computer Vision, pp. 842–857. Springer (2016)

  36. Xiong, X., Liu, Z., Cai, Q., Zhang, Z.: Eye gaze tracking using an RGBD camera: a comparison with a RGB solution. In: Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, pp. 1113–1121. ACM (2014)

  37. Zhang, R., Tsai, P.S., Cryer, J.E., Shah, M.: Shape-from-shading: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 21(8), 690–706 (1999)

    Article  Google Scholar 

  38. Zhang, X., Sugano, Y., Fritz, M., Bulling, A.: Appearance-based gaze estimation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4511–4520 (2015)

  39. Zhang, X., Sugano, Y., Fritz, M., Bulling, A.: It’s written all over your face: full-face appearance-based gaze estimation. In: CVPRW (2017)

  40. Zhang, Y., Funkhouser, T.: Deep depth completion of a single RGB-D image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 175–185 (2018)

  41. Zhu, W., Deng, H.: Monocular free-head 3d gaze tracking with deep learning and geometry constraints. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3143–3152 (2017)

  42. Zhu, Z., Ji, Q., Bennett, K.P.: Nonlinear eye gaze mapping function estimation via support vector regression. In: 18th International Conference on Pattern Recognition, 2006. ICPR 2006, vol. 1, pp. 1132–1135. IEEE (2006)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ziheng Zhang.

Ethics declarations

Conflict of interest

The author declares that they have no conflict of interest.

Ethical approval

Ziheng Zhang declares that all procedures performed in studies involving human participants were in accordance with the ethical standards of the ShanghaiTech Ethics Committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards, and that no study with animals was performed by any of the authors. Ziheng Zhang declares that informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Z., Lian, D. & Gao, S. RGB-D-based gaze point estimation via multi-column CNNs and facial landmarks global optimization. Vis Comput 37, 1731–1741 (2021). https://doi.org/10.1007/s00371-020-01934-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-020-01934-1

Keywords

Navigation