skip to main content
10.1145/3536221.3556568acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article
Open Access

RGBDGaze: Gaze Tracking on Smartphones with RGB and Depth Data

Published:07 November 2022Publication History

ABSTRACT

Tracking a user’s gaze on smartphones offers the potential for accessible and powerful multimodal interactions. However, phones are used in a myriad of contexts and state-of-the-art gaze models that use only the front-facing RGB cameras are too coarse and do not adapt adequately to changes in context. While prior research has showcased the efficacy of depth maps for gaze tracking, they have been limited to desktop-grade depth cameras, which are more capable than the types seen in smartphones, that must be thin and low-powered. In this paper, we present a gaze tracking system that makes use of today’s smartphone depth camera technology to adapt to the changes in distance and orientation relative to the user’s face. Unlike prior efforts that used depth sensors, we do not constrain the users to maintain a fixed head position. Our approach works across different use contexts in unconstrained mobile settings. The results show that our multimodal ML model has a mean gaze error of 1.89 cm; a 16.3% improvement over using RGB data alone (2.26 cm error). Our system and dataset offer the first benchmark of gaze tracking on smartphones using RGB+Depth data under different use contexts.

References

  1. Karan Ahuja, Ruchika Banerjee, Seema Nagar, Kuntal Dey, and Ferdous A. Barbhuiya. 2016. Eye center localization and detection using radial mapping. In 2016 IEEE International Conference on Image Processing, ICIP 2016, Phoenix, AZ, USA, September 25-28, 2016. IEEE, 3121–3125. https://doi.org/10.1109/ICIP.2016.7532934Google ScholarGoogle ScholarCross RefCross Ref
  2. Karan Ahuja, Rahul Islam, Varun Parashar, Kuntal Dey, Chris Harrison, and Mayank Goel. 2018. Eyespyvr: Interactive eye sensing using off-the-shelf, smartphone-based vr headsets. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 2 (2018), 1–10. https://doi.org/10.1145/3214260Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Karan Ahuja, Sven Mayer, Mayank Goel, and Chris Harrison. 2021. Pose-on-the-Go: Approximating User Pose with Smartphone Sensor Fusion and Inverse Kinematics. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Andronicus Ayobami Akinyelu and Pieter J. Blignaut. 2020. Convolutional Neural Network-Based Methods for Eye Gaze Estimation: A Survey. IEEE Access 8(2020), 142581–142605. https://doi.org/10.1109/ACCESS.2020.3013540Google ScholarGoogle ScholarCross RefCross Ref
  5. Apple. 2017. CoreML Framework. https://developer.apple.com/documentation/coremlGoogle ScholarGoogle Scholar
  6. Apple. 2017. Vision Framework. https://developer.apple.com/documentation/visionGoogle ScholarGoogle Scholar
  7. Apple. 2020. ARKit Framework. https://developer.apple.com/documentation/arkitGoogle ScholarGoogle Scholar
  8. Léon Bottou. 2010. Large-Scale Machine Learning with Stochastic Gradient Descent. In 19th International Conference on Computational Statistics, COMPSTAT 2010, Paris, France, August 22-27, 2010 - Keynote, Invited and Contributed Papers. Physica-Verlag, 177–186. https://doi.org/10.1007/978-3-7908-2604-3_16Google ScholarGoogle Scholar
  9. Haibin Cai, Xiaolong Zhou, Hui Yu, and Honghai Liu. 2015. Gaze estimation driven solution for interacting children with ASD. In 2015 International Symposium on Micro-NanoMechatronics and Human Science, MHS 2015, Nagoya, Japan, November 23-25, 2015. IEEE, New York, 1–6. https://doi.org/10.1109/MHS.2015.7438336Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Marcus Carter, Joshua Newn, Eduardo Velloso, and Frank Vetere. 2015. Remote Gaze and Gesture Tracking on the Microsoft Kinect: Investigating the Role of Feedback. In Proceedings of the Annual Meeting of the Australian Special Interest Group for Computer Human Interaction, OZCHI 2015, Parkville, VIC, Australia, December 7-10, 2015. ACM, New York, 167–176. https://doi.org/10.1145/2838739.2838778Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Pierluigi Casale, Oriol Pujol, and Petia Radeva. 2011. Human activity recognition from accelerometer data using a wearable device. In Iberian conference on pattern recognition and image analysis. Springer, 289–296.Google ScholarGoogle ScholarCross RefCross Ref
  12. Dario Cazzato, Marco Leo, Cosimo Distante, and Holger Voos. 2020. When I Look into Your Eyes: A Survey on Computer Vision Contributions for Human Gaze Estimation and Tracking. Sensors 20, 13 (2020), 3739. https://doi.org/10.3390/s20133739Google ScholarGoogle ScholarCross RefCross Ref
  13. Jan Cech and Tereza Soukupova. 2016. Real-time eye blink detection using facial landmarks. Cent. Mach. Perception, Dep. Cybern. Fac. Electr. Eng. Czech Tech. Univ. Prague (2016), 1–8.Google ScholarGoogle Scholar
  14. Tobias Fischer, Hyung Jin Chang, and Yiannis Demiris. 2018. RT-GENE: Real-Time Eye Gaze Estimation in Natural Environments. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part X(Lecture Notes in Computer Science, Vol. 11214). Springer, 339–357. https://doi.org/10.1007/978-3-030-01249-6_21Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Reza Shoja Ghiass and Ognjen Arandjelovic. 2016. Highly Accurate Gaze Estimation Using a Consumer RGB-D Sensor. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016. IJCAI/AAAI Press, 3368–3374. http://www.ijcai.org/Abstract/16/476Google ScholarGoogle Scholar
  16. Google. 2020. uDepth: Real-time 3D Depth Sensing on the Pixel 4. https://ai.googleblog.com/2020/04/udepth-real-time-3d-depth-sensing-on.htmlGoogle ScholarGoogle Scholar
  17. Qiong Huang, Ashok Veeraraghavan, and Ashutosh Sabharwal. 2017. TabletGaze: dataset and analysis for unconstrained appearance-based gaze estimation in mobile tablets. Mach. Vis. Appl. 28, 5-6 (2017), 445–461. https://doi.org/10.1007/s00138-017-0852-4Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sinh Huynh, Rajesh Krishna Balan, and JeongGil Ko. 2021. iMon: Appearance-based gaze tracking system on mobile devices. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 4 (2021), 1–26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Reza Jafari and Djemel Ziou. 2015. Eye-gaze estimation under various head positions and iris states. Expert Syst. Appl. 42, 1 (2015), 510–518. https://doi.org/10.1016/j.eswa.2014.08.003Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Moritz Kassner, William Patera, and Andreas Bulling. 2014. Pupil: an open source platform for pervasive eye tracking and mobile gaze-based interaction. In The 2014 ACM Conference on Ubiquitous Computing, UbiComp ’14 Adjunct, Seattle, WA, USA - September 13 - 17, 2014. ACM, New York, 1151–1160. https://doi.org/10.1145/2638728.2641695Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Petr Kellnhofer, Adrià Recasens, Simon Stent, Wojciech Matusik, and Antonio Torralba. 2019. Gaze360: Physically Unconstrained Gaze Estimation in the Wild. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019. IEEE, New York, 6911–6920. https://doi.org/10.1109/ICCV.2019.00701Google ScholarGoogle ScholarCross RefCross Ref
  22. Leonid Keselman, John Iselin Woodfill, Anders Grunnet-Jepsen, and Achintya Bhowmik. 2017. Intel(R) RealSense(TM) Stereoscopic Depth Cameras. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2017, Honolulu, HI, USA, July 21-26, 2017. IEEE Computer Society, New York, 1267–1276. https://doi.org/10.1109/CVPRW.2017.167Google ScholarGoogle ScholarCross RefCross Ref
  23. Mohamed Khamis, Florian Alt, and Andreas Bulling. 2018. The past, present, and future of gaze-enabled handheld mobile devices: survey and lessons learned. In Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services, MobileHCI 2018, Barcelona, Spain, September 03-06, 2018. ACM, New York, 38:1–38:17. https://doi.org/10.1145/3229434.3229452Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Andy Kong, Karan Ahuja, Mayank Goel, and Chris Harrison. 2021. EyeMU Interactions: Gaze+IMU Gestures on Mobile Devices. In ICMI ’21: nternational Conference on Multimodal Interaction, Canada, October 18-22, 2021. ACM, New York, 1–10. https://doi.org/10.1145/3462244.3479938Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Kyle Krafka, Aditya Khosla, Petr Kellnhofer, Harini Kannan, Suchendra M. Bhandarkar, Wojciech Matusik, and Antonio Torralba. 2016. Eye Tracking for Everyone. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. IEEE Computer Society, New York, 2176–2184. https://doi.org/10.1109/CVPR.2016.239Google ScholarGoogle Scholar
  26. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2017), 84–90. https://doi.org/10.1145/3065386Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Christian Lander, Markus Löchtefeld, and Antonio Krüger. 2017. hEYEbrid: A hybrid approach for mobile calibration-free gaze estimation. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1, 4(2017), 149:1–149:29. https://doi.org/10.1145/3161166Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Song-Mi Lee, Sang Min Yoon, and Heeryon Cho. 2017. Human activity recognition from accelerometer data using Convolutional Neural Network. In 2017 ieee international conference on big data and smart computing (bigcomp). IEEE, 131–134.Google ScholarGoogle Scholar
  29. Jianfeng Li and Shigang Li. 2014. Eye-Model-Based Gaze Estimation by RGB-D Camera. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR Workshops 2014, Columbus, OH, USA, June 23-28, 2014. IEEE Computer Society, New York, 606–610. https://doi.org/10.1109/CVPRW.2014.93Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Dongze Lian, Ziheng Zhang, Weixin Luo, Lina Hu, Minye Wu, Zechao Li, Jingyi Yu, and Shenghua Gao. 2019. RGBD Based Gaze Estimation via Multi-Task CNN. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, Hawaii, USA, January 27 - February 1, 2019. AAAI Press, 2488–2495. https://doi.org/10.1609/aaai.v33i01.33012488Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Song Liu, Danping Liu, and Haiyang Wu. 2020. Gaze Estimation with Multi-Scale Channel and Spatial Attention. In ICCPR 2020: 9th International Conference on Computing and Pattern Recognition, Xiamen, China, October 30 - Vovember 1, 2020. ACM, New York, 303–309. https://doi.org/10.1145/3436369.3437438Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Sven Mayer, Gierad Laput, and Chris Harrison. 2020. Enhancing Mobile Voice Assistants with WorldGaze. In CHI ’20: CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, April 25-30, 2020. ACM, New York, 1–10. https://doi.org/10.1145/3313831.3376479Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Christopher McMurrough, Vangelis Metsis, Jonathan Rich, and Fillia Makedon. 2012. An eye tracking dataset for point of gaze detection. In Proceedings of the 2012 Symposium on Eye-Tracking Research and Applications, ETRA 2012, Santa Barbara, CA, USA, March 28-30, 2012. ACM, New York, 305–308. https://doi.org/10.1145/2168556.2168622Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Kenneth Alberto Funes Mora, Florent Monay, and Jean-Marc Odobez. 2014. EYEDIAP: a database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras. In Eye Tracking Research and Applications, ETRA ’14, Safety Harbor, FL, USA, March 26-28, 2014. ACM, New York, 255–258. https://doi.org/10.1145/2578153.2578190Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada. 8024–8035.Google ScholarGoogle Scholar
  36. Wen Qi, Hang Su, Chenguang Yang, Giancarlo Ferrigno, Elena De Momi, and Andrea Aliverti. 2019. A fast and robust deep convolutional neural networks for complex human activity recognition using smartphone. Sensors 19, 17 (2019), 3731.Google ScholarGoogle ScholarCross RefCross Ref
  37. Brian A. Smith, Qi Yin, Steven K. Feiner, and Shree K. Nayar. 2013. Gaze locking: passive eye contact detection for human-object interaction. In The 26th Annual ACM Symposium on User Interface Software and Technology, UIST’13, St. Andrews, United Kingdom, October 8-11, 2013. ACM, New York, 271–280. https://doi.org/10.1145/2501988.2501994Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Dan Su, Youfu Li, and Hao Chen. 2019. Toward Precise Gaze Estimation for Mobile Head-Mounted Gaze Tracking Systems. IEEE Trans. Ind. Informatics 15, 5 (2019), 2660–2672. https://doi.org/10.1109/TII.2018.2867952Google ScholarGoogle ScholarCross RefCross Ref
  39. Yusuke Sugano, Yasuyuki Matsushita, and Yoichi Sato. 2014. Learning-by-Synthesis for Appearance-Based 3D Gaze Estimation. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23-28, 2014. IEEE Computer Society, New York, 1821–1828. https://doi.org/10.1109/CVPR.2014.235Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Li Sun, Zicheng Liu, and Ming-Ting Sun. 2015. Real time gaze estimation with a consumer depth camera. Inf. Sci. 320(2015), 346–360. https://doi.org/10.1016/j.ins.2015.02.004Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Tobii Pro AB. 2014. Tobii Pro Lab. Computer software. http://www.tobiipro.com/Google ScholarGoogle Scholar
  42. Marc Tonsen, Julian Steil, Yusuke Sugano, and Andreas Bulling. 2017. InvisibleEye: Mobile Eye Tracking Using Multiple Low-Resolution Cameras and Learning-Based Gaze Estimation. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1, 3(2017), 106:1–106:21. https://doi.org/10.1145/3130971Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Akihiro Tsukada, Motoki Shino, Michael Devyver, and Takeo Kanade. 2011. Illumination-free gaze estimation method for first-person vision wearable device. In IEEE International Conference on Computer Vision Workshops, ICCV 2011 Workshops, Barcelona, Spain, November 6-13, 2011. IEEE Computer Society, New York, 2084–2091. https://doi.org/10.1109/ICCVW.2011.6130505Google ScholarGoogle ScholarCross RefCross Ref
  44. Nachiappan Valliappan, Na Dai, Ethan Steinberg, Junfeng He, Kantwon Rogers, Venky Ramachandran, Pingmei Xu, Mina Shojaeizadeh, Li Guo, Kai Kohlhoff, and Vidhya Navalpakkam. 2020. Accelerating eye movement research via accurate and affordable smartphone eye tracking. Nature Communications 11, 1 (Sept. 2020). https://doi.org/10.1038/s41467-020-18360-5Google ScholarGoogle ScholarCross RefCross Ref
  45. Kang Wang and Qiang Ji. 2016. Real time eye gaze tracking with Kinect. In 23rd International Conference on Pattern Recognition, ICPR 2016, Cancún, Mexico, December 4-8, 2016. IEEE, 2752–2757. https://doi.org/10.1109/ICPR.2016.7900052Google ScholarGoogle ScholarCross RefCross Ref
  46. Eric Whitmire, Laura C. Trutoiu, Robert Cavin, David Perek, Brian Scally, James Phillips, and Shwetak N. Patel. 2016. EyeContact: scleral coil eye tracking for virtual reality. In Proceedings of the 2016 ACM International Symposium on Wearable Computers, ISWC 2016, Heidelberg, Germany, September 12-16, 2016. ACM, 184–191. https://doi.org/10.1145/2971763.2971771Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Erroll Wood and Andreas Bulling. 2014. EyeTab: model-based gaze estimation on unmodified tablet computers. In Eye Tracking Research and Applications, ETRA ’14, Safety Harbor, FL, USA, March 26-28, 2014. ACM, New York, 207–210. https://doi.org/10.1145/2578153.2578185Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Xuehan Xiong, Qin Cai, Zicheng Liu, and Zhengyou Zhang. 2014. Eye gaze tracking using an RGBD camera: a comparison with a RGB solution. In The 2014 ACM Conference on Ubiquitous Computing, UbiComp ’14 Adjunct, Seattle, WA, USA - September 13 - 17, 2014. ACM, New York, 1113–1121. https://doi.org/10.1145/2638728.2641694Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Pingmei Xu, Krista A. Ehinger, Yinda Zhang, Adam Finkelstein, Sanjeev R. Kulkarni, and Jianxiong Xiao. 2015. TurkerGaze: Crowdsourcing Saliency with Webcam based Eye Tracking. CoRR abs/1504.06755(2015). arxiv:1504.06755http://arxiv.org/abs/1504.06755Google ScholarGoogle Scholar
  50. Xiaoyi Zhang, Harish Kulkarni, and Meredith Ringel Morris. 2017. Smartphone-Based Gaze Gesture Communication for People with Motor Disabilities. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, Denver, CO, USA, May 06-11, 2017. ACM, New York, 2878–2889. https://doi.org/10.1145/3025453.3025790Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Xucong Zhang, Seonwook Park, Thabo Beeler, Derek Bradley, Siyu Tang, and Otmar Hilliges. 2020. ETH-XGaze: A Large Scale Dataset for Gaze Estimation Under Extreme Head Pose and Gaze Variation. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part V(Lecture Notes in Computer Science, Vol. 12350). Springer, 365–381. https://doi.org/10.1007/978-3-030-58558-7_22Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Xucong Zhang, Yusuke Sugano, Mario Fritz, and Andreas Bulling. 2015. Appearance-based gaze estimation in the wild. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015. IEEE Computer Society, New York, 4511–4520. https://doi.org/10.1109/CVPR.2015.7299081Google ScholarGoogle ScholarCross RefCross Ref
  53. Xucong Zhang, Yusuke Sugano, Mario Fritz, and Andreas Bulling. 2017. It’s Written All Over Your Face: Full-Face Appearance-Based Gaze Estimation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2017, Honolulu, HI, USA, July 21-26, 2017. IEEE Computer Society, New York, 2299–2308. https://doi.org/10.1109/CVPRW.2017.284Google ScholarGoogle Scholar
  54. Zhengyou Zhang. 2012. Microsoft kinect sensor and its effect. IEEE multimedia 19, 2 (2012), 4–10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Xiaolong Zhou, Haibin Cai, Youfu Li, and Honghai Liu. 2017. Two-eye model-based gaze estimation from a Kinect sensor. In 2017 IEEE International Conference on Robotics and Automation, ICRA 2017, Singapore, Singapore, May 29 - June 3, 2017. IEEE, New York, 1646–1653. https://doi.org/10.1109/ICRA.2017.7989194Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. RGBDGaze: Gaze Tracking on Smartphones with RGB and Depth Data

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ICMI '22: Proceedings of the 2022 International Conference on Multimodal Interaction
        November 2022
        830 pages
        ISBN:9781450393904
        DOI:10.1145/3536221

        Copyright © 2022 Owner/Author

        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 7 November 2022

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate453of1,080submissions,42%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format