ABSTRACT
Tracking a user’s gaze on smartphones offers the potential for accessible and powerful multimodal interactions. However, phones are used in a myriad of contexts and state-of-the-art gaze models that use only the front-facing RGB cameras are too coarse and do not adapt adequately to changes in context. While prior research has showcased the efficacy of depth maps for gaze tracking, they have been limited to desktop-grade depth cameras, which are more capable than the types seen in smartphones, that must be thin and low-powered. In this paper, we present a gaze tracking system that makes use of today’s smartphone depth camera technology to adapt to the changes in distance and orientation relative to the user’s face. Unlike prior efforts that used depth sensors, we do not constrain the users to maintain a fixed head position. Our approach works across different use contexts in unconstrained mobile settings. The results show that our multimodal ML model has a mean gaze error of 1.89 cm; a 16.3% improvement over using RGB data alone (2.26 cm error). Our system and dataset offer the first benchmark of gaze tracking on smartphones using RGB+Depth data under different use contexts.
- Karan Ahuja, Ruchika Banerjee, Seema Nagar, Kuntal Dey, and Ferdous A. Barbhuiya. 2016. Eye center localization and detection using radial mapping. In 2016 IEEE International Conference on Image Processing, ICIP 2016, Phoenix, AZ, USA, September 25-28, 2016. IEEE, 3121–3125. https://doi.org/10.1109/ICIP.2016.7532934Google ScholarCross Ref
- Karan Ahuja, Rahul Islam, Varun Parashar, Kuntal Dey, Chris Harrison, and Mayank Goel. 2018. Eyespyvr: Interactive eye sensing using off-the-shelf, smartphone-based vr headsets. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 2 (2018), 1–10. https://doi.org/10.1145/3214260Google ScholarDigital Library
- Karan Ahuja, Sven Mayer, Mayank Goel, and Chris Harrison. 2021. Pose-on-the-Go: Approximating User Pose with Smartphone Sensor Fusion and Inverse Kinematics. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–12.Google ScholarDigital Library
- Andronicus Ayobami Akinyelu and Pieter J. Blignaut. 2020. Convolutional Neural Network-Based Methods for Eye Gaze Estimation: A Survey. IEEE Access 8(2020), 142581–142605. https://doi.org/10.1109/ACCESS.2020.3013540Google ScholarCross Ref
- Apple. 2017. CoreML Framework. https://developer.apple.com/documentation/coremlGoogle Scholar
- Apple. 2017. Vision Framework. https://developer.apple.com/documentation/visionGoogle Scholar
- Apple. 2020. ARKit Framework. https://developer.apple.com/documentation/arkitGoogle Scholar
- Léon Bottou. 2010. Large-Scale Machine Learning with Stochastic Gradient Descent. In 19th International Conference on Computational Statistics, COMPSTAT 2010, Paris, France, August 22-27, 2010 - Keynote, Invited and Contributed Papers. Physica-Verlag, 177–186. https://doi.org/10.1007/978-3-7908-2604-3_16Google Scholar
- Haibin Cai, Xiaolong Zhou, Hui Yu, and Honghai Liu. 2015. Gaze estimation driven solution for interacting children with ASD. In 2015 International Symposium on Micro-NanoMechatronics and Human Science, MHS 2015, Nagoya, Japan, November 23-25, 2015. IEEE, New York, 1–6. https://doi.org/10.1109/MHS.2015.7438336Google ScholarDigital Library
- Marcus Carter, Joshua Newn, Eduardo Velloso, and Frank Vetere. 2015. Remote Gaze and Gesture Tracking on the Microsoft Kinect: Investigating the Role of Feedback. In Proceedings of the Annual Meeting of the Australian Special Interest Group for Computer Human Interaction, OZCHI 2015, Parkville, VIC, Australia, December 7-10, 2015. ACM, New York, 167–176. https://doi.org/10.1145/2838739.2838778Google ScholarDigital Library
- Pierluigi Casale, Oriol Pujol, and Petia Radeva. 2011. Human activity recognition from accelerometer data using a wearable device. In Iberian conference on pattern recognition and image analysis. Springer, 289–296.Google ScholarCross Ref
- Dario Cazzato, Marco Leo, Cosimo Distante, and Holger Voos. 2020. When I Look into Your Eyes: A Survey on Computer Vision Contributions for Human Gaze Estimation and Tracking. Sensors 20, 13 (2020), 3739. https://doi.org/10.3390/s20133739Google ScholarCross Ref
- Jan Cech and Tereza Soukupova. 2016. Real-time eye blink detection using facial landmarks. Cent. Mach. Perception, Dep. Cybern. Fac. Electr. Eng. Czech Tech. Univ. Prague (2016), 1–8.Google Scholar
- Tobias Fischer, Hyung Jin Chang, and Yiannis Demiris. 2018. RT-GENE: Real-Time Eye Gaze Estimation in Natural Environments. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part X(Lecture Notes in Computer Science, Vol. 11214). Springer, 339–357. https://doi.org/10.1007/978-3-030-01249-6_21Google ScholarDigital Library
- Reza Shoja Ghiass and Ognjen Arandjelovic. 2016. Highly Accurate Gaze Estimation Using a Consumer RGB-D Sensor. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016. IJCAI/AAAI Press, 3368–3374. http://www.ijcai.org/Abstract/16/476Google Scholar
- Google. 2020. uDepth: Real-time 3D Depth Sensing on the Pixel 4. https://ai.googleblog.com/2020/04/udepth-real-time-3d-depth-sensing-on.htmlGoogle Scholar
- Qiong Huang, Ashok Veeraraghavan, and Ashutosh Sabharwal. 2017. TabletGaze: dataset and analysis for unconstrained appearance-based gaze estimation in mobile tablets. Mach. Vis. Appl. 28, 5-6 (2017), 445–461. https://doi.org/10.1007/s00138-017-0852-4Google ScholarDigital Library
- Sinh Huynh, Rajesh Krishna Balan, and JeongGil Ko. 2021. iMon: Appearance-based gaze tracking system on mobile devices. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 4 (2021), 1–26.Google ScholarDigital Library
- Reza Jafari and Djemel Ziou. 2015. Eye-gaze estimation under various head positions and iris states. Expert Syst. Appl. 42, 1 (2015), 510–518. https://doi.org/10.1016/j.eswa.2014.08.003Google ScholarDigital Library
- Moritz Kassner, William Patera, and Andreas Bulling. 2014. Pupil: an open source platform for pervasive eye tracking and mobile gaze-based interaction. In The 2014 ACM Conference on Ubiquitous Computing, UbiComp ’14 Adjunct, Seattle, WA, USA - September 13 - 17, 2014. ACM, New York, 1151–1160. https://doi.org/10.1145/2638728.2641695Google ScholarDigital Library
- Petr Kellnhofer, Adrià Recasens, Simon Stent, Wojciech Matusik, and Antonio Torralba. 2019. Gaze360: Physically Unconstrained Gaze Estimation in the Wild. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019. IEEE, New York, 6911–6920. https://doi.org/10.1109/ICCV.2019.00701Google ScholarCross Ref
- Leonid Keselman, John Iselin Woodfill, Anders Grunnet-Jepsen, and Achintya Bhowmik. 2017. Intel(R) RealSense(TM) Stereoscopic Depth Cameras. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2017, Honolulu, HI, USA, July 21-26, 2017. IEEE Computer Society, New York, 1267–1276. https://doi.org/10.1109/CVPRW.2017.167Google ScholarCross Ref
- Mohamed Khamis, Florian Alt, and Andreas Bulling. 2018. The past, present, and future of gaze-enabled handheld mobile devices: survey and lessons learned. In Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services, MobileHCI 2018, Barcelona, Spain, September 03-06, 2018. ACM, New York, 38:1–38:17. https://doi.org/10.1145/3229434.3229452Google ScholarDigital Library
- Andy Kong, Karan Ahuja, Mayank Goel, and Chris Harrison. 2021. EyeMU Interactions: Gaze+IMU Gestures on Mobile Devices. In ICMI ’21: nternational Conference on Multimodal Interaction, Canada, October 18-22, 2021. ACM, New York, 1–10. https://doi.org/10.1145/3462244.3479938Google ScholarDigital Library
- Kyle Krafka, Aditya Khosla, Petr Kellnhofer, Harini Kannan, Suchendra M. Bhandarkar, Wojciech Matusik, and Antonio Torralba. 2016. Eye Tracking for Everyone. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. IEEE Computer Society, New York, 2176–2184. https://doi.org/10.1109/CVPR.2016.239Google Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2017), 84–90. https://doi.org/10.1145/3065386Google ScholarDigital Library
- Christian Lander, Markus Löchtefeld, and Antonio Krüger. 2017. hEYEbrid: A hybrid approach for mobile calibration-free gaze estimation. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1, 4(2017), 149:1–149:29. https://doi.org/10.1145/3161166Google ScholarDigital Library
- Song-Mi Lee, Sang Min Yoon, and Heeryon Cho. 2017. Human activity recognition from accelerometer data using Convolutional Neural Network. In 2017 ieee international conference on big data and smart computing (bigcomp). IEEE, 131–134.Google Scholar
- Jianfeng Li and Shigang Li. 2014. Eye-Model-Based Gaze Estimation by RGB-D Camera. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR Workshops 2014, Columbus, OH, USA, June 23-28, 2014. IEEE Computer Society, New York, 606–610. https://doi.org/10.1109/CVPRW.2014.93Google ScholarDigital Library
- Dongze Lian, Ziheng Zhang, Weixin Luo, Lina Hu, Minye Wu, Zechao Li, Jingyi Yu, and Shenghua Gao. 2019. RGBD Based Gaze Estimation via Multi-Task CNN. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, Hawaii, USA, January 27 - February 1, 2019. AAAI Press, 2488–2495. https://doi.org/10.1609/aaai.v33i01.33012488Google ScholarDigital Library
- Song Liu, Danping Liu, and Haiyang Wu. 2020. Gaze Estimation with Multi-Scale Channel and Spatial Attention. In ICCPR 2020: 9th International Conference on Computing and Pattern Recognition, Xiamen, China, October 30 - Vovember 1, 2020. ACM, New York, 303–309. https://doi.org/10.1145/3436369.3437438Google ScholarDigital Library
- Sven Mayer, Gierad Laput, and Chris Harrison. 2020. Enhancing Mobile Voice Assistants with WorldGaze. In CHI ’20: CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, April 25-30, 2020. ACM, New York, 1–10. https://doi.org/10.1145/3313831.3376479Google ScholarDigital Library
- Christopher McMurrough, Vangelis Metsis, Jonathan Rich, and Fillia Makedon. 2012. An eye tracking dataset for point of gaze detection. In Proceedings of the 2012 Symposium on Eye-Tracking Research and Applications, ETRA 2012, Santa Barbara, CA, USA, March 28-30, 2012. ACM, New York, 305–308. https://doi.org/10.1145/2168556.2168622Google ScholarDigital Library
- Kenneth Alberto Funes Mora, Florent Monay, and Jean-Marc Odobez. 2014. EYEDIAP: a database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras. In Eye Tracking Research and Applications, ETRA ’14, Safety Harbor, FL, USA, March 26-28, 2014. ACM, New York, 255–258. https://doi.org/10.1145/2578153.2578190Google ScholarDigital Library
- Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada. 8024–8035.Google Scholar
- Wen Qi, Hang Su, Chenguang Yang, Giancarlo Ferrigno, Elena De Momi, and Andrea Aliverti. 2019. A fast and robust deep convolutional neural networks for complex human activity recognition using smartphone. Sensors 19, 17 (2019), 3731.Google ScholarCross Ref
- Brian A. Smith, Qi Yin, Steven K. Feiner, and Shree K. Nayar. 2013. Gaze locking: passive eye contact detection for human-object interaction. In The 26th Annual ACM Symposium on User Interface Software and Technology, UIST’13, St. Andrews, United Kingdom, October 8-11, 2013. ACM, New York, 271–280. https://doi.org/10.1145/2501988.2501994Google ScholarDigital Library
- Dan Su, Youfu Li, and Hao Chen. 2019. Toward Precise Gaze Estimation for Mobile Head-Mounted Gaze Tracking Systems. IEEE Trans. Ind. Informatics 15, 5 (2019), 2660–2672. https://doi.org/10.1109/TII.2018.2867952Google ScholarCross Ref
- Yusuke Sugano, Yasuyuki Matsushita, and Yoichi Sato. 2014. Learning-by-Synthesis for Appearance-Based 3D Gaze Estimation. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23-28, 2014. IEEE Computer Society, New York, 1821–1828. https://doi.org/10.1109/CVPR.2014.235Google ScholarDigital Library
- Li Sun, Zicheng Liu, and Ming-Ting Sun. 2015. Real time gaze estimation with a consumer depth camera. Inf. Sci. 320(2015), 346–360. https://doi.org/10.1016/j.ins.2015.02.004Google ScholarDigital Library
- Tobii Pro AB. 2014. Tobii Pro Lab. Computer software. http://www.tobiipro.com/Google Scholar
- Marc Tonsen, Julian Steil, Yusuke Sugano, and Andreas Bulling. 2017. InvisibleEye: Mobile Eye Tracking Using Multiple Low-Resolution Cameras and Learning-Based Gaze Estimation. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1, 3(2017), 106:1–106:21. https://doi.org/10.1145/3130971Google ScholarDigital Library
- Akihiro Tsukada, Motoki Shino, Michael Devyver, and Takeo Kanade. 2011. Illumination-free gaze estimation method for first-person vision wearable device. In IEEE International Conference on Computer Vision Workshops, ICCV 2011 Workshops, Barcelona, Spain, November 6-13, 2011. IEEE Computer Society, New York, 2084–2091. https://doi.org/10.1109/ICCVW.2011.6130505Google ScholarCross Ref
- Nachiappan Valliappan, Na Dai, Ethan Steinberg, Junfeng He, Kantwon Rogers, Venky Ramachandran, Pingmei Xu, Mina Shojaeizadeh, Li Guo, Kai Kohlhoff, and Vidhya Navalpakkam. 2020. Accelerating eye movement research via accurate and affordable smartphone eye tracking. Nature Communications 11, 1 (Sept. 2020). https://doi.org/10.1038/s41467-020-18360-5Google ScholarCross Ref
- Kang Wang and Qiang Ji. 2016. Real time eye gaze tracking with Kinect. In 23rd International Conference on Pattern Recognition, ICPR 2016, Cancún, Mexico, December 4-8, 2016. IEEE, 2752–2757. https://doi.org/10.1109/ICPR.2016.7900052Google ScholarCross Ref
- Eric Whitmire, Laura C. Trutoiu, Robert Cavin, David Perek, Brian Scally, James Phillips, and Shwetak N. Patel. 2016. EyeContact: scleral coil eye tracking for virtual reality. In Proceedings of the 2016 ACM International Symposium on Wearable Computers, ISWC 2016, Heidelberg, Germany, September 12-16, 2016. ACM, 184–191. https://doi.org/10.1145/2971763.2971771Google ScholarDigital Library
- Erroll Wood and Andreas Bulling. 2014. EyeTab: model-based gaze estimation on unmodified tablet computers. In Eye Tracking Research and Applications, ETRA ’14, Safety Harbor, FL, USA, March 26-28, 2014. ACM, New York, 207–210. https://doi.org/10.1145/2578153.2578185Google ScholarDigital Library
- Xuehan Xiong, Qin Cai, Zicheng Liu, and Zhengyou Zhang. 2014. Eye gaze tracking using an RGBD camera: a comparison with a RGB solution. In The 2014 ACM Conference on Ubiquitous Computing, UbiComp ’14 Adjunct, Seattle, WA, USA - September 13 - 17, 2014. ACM, New York, 1113–1121. https://doi.org/10.1145/2638728.2641694Google ScholarDigital Library
- Pingmei Xu, Krista A. Ehinger, Yinda Zhang, Adam Finkelstein, Sanjeev R. Kulkarni, and Jianxiong Xiao. 2015. TurkerGaze: Crowdsourcing Saliency with Webcam based Eye Tracking. CoRR abs/1504.06755(2015). arxiv:1504.06755http://arxiv.org/abs/1504.06755Google Scholar
- Xiaoyi Zhang, Harish Kulkarni, and Meredith Ringel Morris. 2017. Smartphone-Based Gaze Gesture Communication for People with Motor Disabilities. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, Denver, CO, USA, May 06-11, 2017. ACM, New York, 2878–2889. https://doi.org/10.1145/3025453.3025790Google ScholarDigital Library
- Xucong Zhang, Seonwook Park, Thabo Beeler, Derek Bradley, Siyu Tang, and Otmar Hilliges. 2020. ETH-XGaze: A Large Scale Dataset for Gaze Estimation Under Extreme Head Pose and Gaze Variation. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part V(Lecture Notes in Computer Science, Vol. 12350). Springer, 365–381. https://doi.org/10.1007/978-3-030-58558-7_22Google ScholarDigital Library
- Xucong Zhang, Yusuke Sugano, Mario Fritz, and Andreas Bulling. 2015. Appearance-based gaze estimation in the wild. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015. IEEE Computer Society, New York, 4511–4520. https://doi.org/10.1109/CVPR.2015.7299081Google ScholarCross Ref
- Xucong Zhang, Yusuke Sugano, Mario Fritz, and Andreas Bulling. 2017. It’s Written All Over Your Face: Full-Face Appearance-Based Gaze Estimation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2017, Honolulu, HI, USA, July 21-26, 2017. IEEE Computer Society, New York, 2299–2308. https://doi.org/10.1109/CVPRW.2017.284Google Scholar
- Zhengyou Zhang. 2012. Microsoft kinect sensor and its effect. IEEE multimedia 19, 2 (2012), 4–10.Google ScholarDigital Library
- Xiaolong Zhou, Haibin Cai, Youfu Li, and Honghai Liu. 2017. Two-eye model-based gaze estimation from a Kinect sensor. In 2017 IEEE International Conference on Robotics and Automation, ICRA 2017, Singapore, Singapore, May 29 - June 3, 2017. IEEE, New York, 1646–1653. https://doi.org/10.1109/ICRA.2017.7989194Google ScholarDigital Library
Index Terms
- RGBDGaze: Gaze Tracking on Smartphones with RGB and Depth Data
Recommendations
Mobile 3D Gaze Tracking Calibration
CRV '15: Proceedings of the 2015 12th Conference on Computer and Robot VisionWe present a new calibration method to combine a mobile eye tracker with an external tracking system to obtain a 3D gaze vector. Our method captures calibration points of varying distances, pupil positions and head positions/orientations. With these ...
Automatic analysis of 3D gaze coordinates on scene objects using data from eye-tracking and motion-capture systems
ETRA '12: Proceedings of the Symposium on Eye Tracking Research and ApplicationsWe implemented a system, called the VICON-EyeTracking Visualizer, that combines mobile eye tracking data with motion capture data to calculate and visualize the 3D gaze vector within the motion capture co-ordinate system. To ensure that both devices ...
Just blink your eyes: a head-free gaze tracking system
CHI EA '03: CHI '03 Extended Abstracts on Human Factors in Computing SystemsWe propose a head-free, easy-setup gaze tracking system designed for a gaze-based Human-Computer Interaction. Our system enables the user to interact with the computer soon after catching the user's eye blinks. The user can move his/her head freely ...
Comments