skip to main content
research-article
Open Access

EgoLocate: Real-time Motion Capture, Localization, and Mapping with Sparse Body-mounted Sensors

Published:26 July 2023Publication History
Skip Abstract Section

Abstract

Human and environment sensing are two important topics in Computer Vision and Graphics. Human motion is often captured by inertial sensors, while the environment is mostly reconstructed using cameras. We integrate the two techniques together in EgoLocate, a system that simultaneously performs human motion capture (mocap), localization, and mapping in real time from sparse body-mounted sensors, including 6 inertial measurement units (IMUs) and a monocular phone camera. On one hand, inertial mocap suffers from large translation drift due to the lack of the global positioning signal. EgoLo-cate leverages image-based simultaneous localization and mapping (SLAM) techniquesto locate the human in the reconstructed scene. Onthe other hand, SLAM often fails when the visual feature is poor. EgoLocate involves inertial mocap to provide a strong prior for the camera motion. Experiments show that localization, a key challenge for both two fields, is largely improved by our technique, compared with the state of the art of the two fields. Our codes are available for research at https://xinyu-yi.github.io/EgoLocate/.

Skip Supplemental Material Section

Supplemental Material

papers_168_VOD.mp4

presentation

mp4

398.7 MB

References

  1. Hiroyasu Akada, Jian Wang, Soshi Shimada, Masaki Takahashi, Christian Theobalt, and Vladislav Golyanik. 2022. UnrealEgo: A New Dataset for Robust Egocentric 3D Human Motion Capture. In European Conference on Computer Vision (ECCV).Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Michael Bloesch, Jan Czarnowski, Ronald Clark, Stefan Leutenegger, and Andrew J Davison. 2018. CodeSLAM---learning a compact, optimisable representation for dense visual SLAM. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2560--2568.Google ScholarGoogle ScholarCross RefCross Ref
  3. Carlos Campos, Richard Elvira, Juan J. Gómez, José M. M. Montiel, and Juan D. Tardós. 2021. ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM. IEEE Transactions on Robotics 37, 6 (2021), 1874--1890.Google ScholarGoogle ScholarCross RefCross Ref
  4. Robert Castle, Georg Klein, and David W Murray. 2008. Video-rate localization in multiple maps for wearable augmented reality. In 2008 12th IEEE International Symposium on Wearable Computers. IEEE, 15--22.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Young-Woon Cha, Husam Shaik, Qian Zhang, Fan Feng, Andrei State, Adrian Ilie, and Henry Fuchs. 2021. Mobile. Egocentric Human Body Motion Reconstruction Using Only Eyeglasses-mounted Cameras and a Few Body-worn Inertial Sensors. In 2021 IEEE Virtual Reality and 3D User Interfaces (VR). 616--625.Google ScholarGoogle Scholar
  6. Long Chen, Haizhou Ai, Rui Chen, Zijie Zhuang, and Shuang Liu. 2020. Cross-View Tracking for Multi-Human 3D Pose Estimation at Over 100 FPS. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  7. Yudi Dai, Yitai Lin, Chenglu Wen, Siqi Shen, Lan Xu, Jingyi Yu, Yuexin Ma, and Cheng Wang. 2022. HSC4D: Human-Centered 4D Scene Capture in Large-Scale Indoor-Outdoor Space Using Wearable IMUs and LiDAR. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6792--6802.Google ScholarGoogle ScholarCross RefCross Ref
  8. Andrew J Davison. 2003. Real-time simultaneous localisation and mapping with a single camera. In Computer Vision, IEEE International Conference on, Vol. 3. IEEE Computer Society, 1403--1403.Google ScholarGoogle Scholar
  9. Junting Dong, Wen Jiang, Qixing Huang, Hujun Bao, and Xiaowei Zhou. 2019. Fast and Robust Multi-Person 3D Pose Estimation From Multiple Views. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  10. Jakob Engel, Vladlen Koltun, and Daniel Cremers. 2017. Direct sparse odometry. IEEE transactions on pattern analysis and machine intelligence 40, 3 (2017), 611--625.Google ScholarGoogle Scholar
  11. Jakob Engel, Thomas Schöps, and Daniel Cremers. 2014. LSD-SLAM: Large-scale direct monocular SLAM. In European conference on computer vision. Springer, 834--849.Google ScholarGoogle ScholarCross RefCross Ref
  12. Martin Felis. 2017. RBDL: an efficient rigid-body dynamics library using recursive algorithms. Autonomous Robots 41 (02 2017).Google ScholarGoogle Scholar
  13. Christian Forster, Matia Pizzoli, and Davide Scaramuzza. 2014. SVO: Fast semi-direct monocular visual odometry. In 2014 IEEE international conference on robotics and automation (ICRA). IEEE, 15--22.Google ScholarGoogle ScholarCross RefCross Ref
  14. Jack H. Geissinger and Alan T. Asbeck. 2020. Motion Inference Using Sparse Inertial Sensors, Self-Supervised Learning, and a New Dataset of Unscripted Human Motion. Sensors 20 (2020).Google ScholarGoogle Scholar
  15. Vladimir Guzov, Aymen Mir, Torsten Sattler, and Gerard Pons-Moll. 2021. Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  16. Vladimir Guzov, Torsten Sattler, and Gerard Pons-Moll. 2022. Visually plausible human-object interaction capture from wearable sensors. In arXiv.Google ScholarGoogle Scholar
  17. Dorian F. Henning, Tristan Laidlow, and Stefan Leutenegger. 2022. BodySLAM: Joint Camera Localisation, Mapping, And Human Motion Tracking. In Computer Vision - ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part XXXIX. 656--673.Google ScholarGoogle Scholar
  18. Ryosuke Hori, Ryo Hachiuma, Mariko Isogawa, Dan Mikami, and Hideo Saito. 2022. Silhouette-Based 3D Human Pose Estimation Using a Single Wrist-Mounted 360° Camera. IEEE Access 10 (2022), 54957--54968.Google ScholarGoogle ScholarCross RefCross Ref
  19. Yinghao Huang, Manuel Kaufmann, Emre Aksan, Michael J. Black, Otmar Hilliges, and Gerard Pons-Moll. 2018. Deep Inertial Poser Learning to Reconstruct Human Pose from SparseInertial Measurements in Real Time. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia) 37 (nov 2018).Google ScholarGoogle Scholar
  20. Hao Jiang and Kristen Grauman. 2017. Seeing Invisible Poses: Estimating 3D Body Pose from Egocentric Video. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  21. Jiaxi Jiang, Paul Streli, Huajian Qiu, Andreas Fender, Larissa Laich, Patrick Snape, and Christian Holz. 2022a. Avatarposer: Articulated full-body pose tracking from sparse motion sensing. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part V. Springer, 443--460.Google ScholarGoogle Scholar
  22. Yifeng Jiang, Yuting Ye, Deepak Gopinath, Jungdam Won, Alexander W. Winkler, and C. Karen Liu. 2022b. Transformer Inertial Poser: Real-Time Human Motion Reconstruction from Sparse IMUs with Simultaneous Terrain Generation. In SIGGRAPH Asia 2022 Conference Papers.Google ScholarGoogle Scholar
  23. Manuel Kaufmann, Yi Zhao, Chengcheng Tang, Lingling Tao, Christopher Twigg, Jie Song, Robert Wang, and Otmar Hilliges. 2021. Em-pose: 3d human pose estimation from sparse electromagnetic trackers. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11510--11520.Google ScholarGoogle ScholarCross RefCross Ref
  24. Georg Klein and David Murray. 2007. Parallel Tracking and Mapping for Small AR Workspaces. In 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality. 225--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Lukas Koestler, Nan Yang, Niclas Zeller, and Daniel Cremers. 2022. Tandem: Tracking and dense mapping in real-time using deep multi-view stereo. In Conference on Robot Learning. PMLR, 34--45.Google ScholarGoogle Scholar
  26. Rainer Kümmerle, Giorgio Grisetti, Hauke Strasdat, Kurt Konolige, and Wolfram Burgard. 2011. G2o: A general framework for graph optimization. In 2011 IEEE International Conference on Robotics and Automation.Google ScholarGoogle ScholarCross RefCross Ref
  27. Stefan Leutenegger, Simon Lynen, Michael Bosse, Roland Siegwart, and Paul Furgale. 2015. Keyframe-based visual-inertial odometry using nonlinear optimization. The International Journal of Robotics Research 34, 3 (2015), 314--334.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jiaman Li, C Karen Liu, and Jiajun Wu. 2022. Ego-Body Pose Estimation via Ego-Head Pose Estimation. arXiv preprint arXiv:2212.04636 (2022).Google ScholarGoogle Scholar
  29. Miao Liu, Dexin Yang, Yan Zhang, Zhaopeng Cui, James M. Rehg, and Siyu Tang. 2021. 4D Human Body Capture from Egocentric Video via 3D Scene Grounding. In 2021 International Conference on 3D Vision (3DV).Google ScholarGoogle ScholarCross RefCross Ref
  30. Yuxuan Liu, Jianxin Yang, Xiao Gu, Yao Guo, and Guang-Zhong Yang. 2022. Ego+X: An Egocentric Vision System for Global 3D Human Pose Estimation and Social Interaction Characterization. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 5271--5277.Google ScholarGoogle ScholarCross RefCross Ref
  31. Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A Skinned Multi-Person Linear Model. ACM Trans. Graphics (Proc. SIGGRAPH Asia) 34 (oct 2015).Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Zhengyi Luo, Ryo Hachiuma, Ye Yuan, and Kris Kitani. 2021. Dynamics-Regulated Kinematic Policy for Egocentric Pose Estimation. In Advances in Neural Information Processing Systems.Google ScholarGoogle Scholar
  33. Naureen Mahmood, Nima Ghorbani, Nikolaus F. Troje, Gerard Pons-Moll, and Michael J. Black. 2019. AMASS: Archive of Motion Capture as Surface Shapes. In The IEEE International Conference on Computer Vision (ICCV).Google ScholarGoogle Scholar
  34. Anastasios I Mourikis, Stergios I Roumeliotis, et al. 2007. A Multi-State Constraint Kalman Filter for Vision-aided Inertial Navigation.. In ICRA, Vol. 2. 6.Google ScholarGoogle Scholar
  35. Raul Mur-Artal, J. M. M. Montiel, and Juan D. Tardos. 2015. ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Transactions on Robotics 31, 5 (oct 2015), 1147--1163.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Raul Mur-Artal and Juan D Tardós. 2017a. Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE transactions on robotics 33, 5 (2017), 1255--1262.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Raúl Mur-Artal and Juan D Tardós. 2017b. Visual-inertial monocular SLAM with map reuse. IEEE Robotics and Automation Letters 2, 2 (2017), 796--803.Google ScholarGoogle ScholarCross RefCross Ref
  38. Patrik Puchert and Timo Ropinski. 2021. Human Pose Estimation from Sparse Inertial Measurements through Recurrent Graph Convolution. CoRR abs/2107.11214 (2021). arXiv:2107.11214Google ScholarGoogle Scholar
  39. Pytorch. [n. d.]. Pytorch. Website. https://pytorch.org/.Google ScholarGoogle Scholar
  40. Tong Qin, Peiliang Li, and Shaojie Shen. 2018. Vins-mono: A robust and versatile monocular visual-inertial state estimator. IEEE Transactions on Robotics 34, 4 (2018), 1004--1020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. N Dinesh Reddy, Laurent Guigues, Leonid Pishchulin, Jayan Eledath, and Srinivasa G Narasimhan. 2021. Tessetrack: End-to-end learnable multi-person articulated 3d pose tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15190--15200.Google ScholarGoogle ScholarCross RefCross Ref
  42. Helge Rhodin, Christian Richardt, Dan Casas, Eldar Insafutdinov, Mohammad Shafiei, Hans-Peter Seidel, Bernt Schiele, and Christian Theobalt. 2016. EgoCap: Egocentric Marker-Less Motion Capture with Two Fisheye Cameras. 35 (2016).Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Qaiser Riaz, Guanhong Tao, Björn Krüger, and Andreas Weber. 2015. Motion Reconstruction Using Very Few Accelerometers and Ground Contacts. Graph. Models 79 (may 2015).Google ScholarGoogle Scholar
  44. Patrik Schmuck and Margarita Chli. 2017. Multi-uav collaborative monocular slam. In 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 3863--3870.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Ruizhi Shao, Zerong Zheng, Hongwen Zhang, Jingxiang Sun, and Yebin Liu. 2022. Diffustereo: High quality human reconstruction via diffusion-based stereo using sparse cameras. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part XXXII. Springer, 702--720.Google ScholarGoogle Scholar
  46. Takaaki Shiratori, Hyun Soo Park, Leonid Sigal, Yaser Sheikh, and Jessica K. Hodgins. 2011. Motion Capture from Body-Mounted Cameras. ACM Trans. Graph. 30, 4, Article 31 (jul 2011), 10 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Ronit Slyper and Jessica Hodgins. 2008. Action Capture with Accelerometers. ACM SIGGRAPH/Eurographics Symposium on Computer Animation (01 2008).Google ScholarGoogle Scholar
  48. Jochen Tautges, Arno Zinke, Björn Krüger, Jan Baumann, Andreas Weber, Thomas Helten, Meinard Müller, Hans-Peter Seidel, and Bernhard Eberhardt. 2011. Motion Reconstruction Using Sparse Accelerometer Data. ACM Transactions on Graphics 30 (05 2011).Google ScholarGoogle Scholar
  49. Zachary Teed and Jia Deng. 2021. Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras. Advances in Neural Information Processing Systems 34 (2021), 16558--16569.Google ScholarGoogle Scholar
  50. Denis Tome, Thiemo Alldieck, Patrick Peluse, Gerard Pons-Moll, Lourdes Agapito, Hernan Badino, and Fernando de la Torre. 2020. SelfPose: 3D Egocentric Pose Estimation from a Headset Mounted Camera. IEEE Transactions on Pattern Analysis and Machine Intelligence (Oct 2020).Google ScholarGoogle Scholar
  51. Denis Tome, Patrick Peluse, Lourdes Agapito, and Hernan Badino. 2019. xr-egopose: Egocentric 3d human pose from an hmd camera. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).Google ScholarGoogle ScholarCross RefCross Ref
  52. Matthew Trumble, Andrew Gilbert, Charles Malleson, Adrian Hilton, and John Collomosse. 2017. Total Capture: 3D Human Pose Estimation Fusing Video and Inertial Sensors. In 2017 British Machine Vision Conference (BMVC).Google ScholarGoogle ScholarCross RefCross Ref
  53. Daniel Vlasic, Rolf Adelsberger, Giovanni Vannucci, John Barnwell, Markus Gross, Wojciech Matusik, and Jovan Popović. 2007. Practical Motion Capture in Everyday Surroundings. ACM Trans. Graph. 26 (jul 2007).Google ScholarGoogle Scholar
  54. Timo von Marcard, Roberto Henschel, Michael Black, Bodo Rosenhahn, and Gerard Pons-Moll. 2018. Recovering Accurate 3D Human Pose in The Wild Using IMUs and a Moving Camera. In European Conference on Computer Vision (ECCV).Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Timo von Marcard, Bodo Rosenhahn, Michael Black, and Gerard Pons-Moll. 2017. Sparse Inertial Poser: Automatic 3D Human Pose Estimation from Sparse IMUs. Computer Graphics Forum 36(2), Proceedings of the 38th Annual Conference of the European Association for Computer Graphics (Eurographics) (2017).Google ScholarGoogle Scholar
  56. Lukas Von Stumberg, Vladyslav Usenko, and Daniel Cremers. 2018. Direct sparse visual-inertial odometry using dynamic marginalization. In 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2510--2517.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Jian Wang, Lingjie Liu, Weipeng Xu, Kripasindhu Sarkar, and Christian Theobalt. 2021. Estimating Egocentric 3D Human Pose in Global Space. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).Google ScholarGoogle ScholarCross RefCross Ref
  58. Jian Wang, Diogo Luvizon, Weipeng Xu, Lingjie Liu, Kripasindhu Sarkar, and Christian Theobalt. 2023. Scene-aware Egocentric 3D Human Pose Estimation. CVPR (2023).Google ScholarGoogle Scholar
  59. Alexander Winkler, Jungdam Won, and Yuting Ye. 2022. QuestSim: Human Motion Tracking from Sparse Sensors with Simulated Avatars. In SIGGRAPH Asia 2022 Conference Papers. 1--8.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Di Xia, Yeqing Zhu, and Heng Zhang. 2022. Faster Deep Inertial Pose Estimation with Six Inertial Sensors. Sensors 22, 19 (2022).Google ScholarGoogle Scholar
  61. Xsens. [n. d.]. Xsens 3D motion tracking. Website. https://www.xsens.com/.Google ScholarGoogle Scholar
  62. Weipeng Xu, Avishek Chatterjee, Michael Zollhoefer, Helge Rhodin, Pascal Fua, HansPeter Seidel, and Christian Theobalt. 2019. Mo2Cap2 : Real-time Mobile 3D Motion Capture with a Cap-mounted Fisheye Camera. IEEE Transactions on Visualization and Computer Graphics (2019).Google ScholarGoogle ScholarCross RefCross Ref
  63. Nan Yang, Lukas von Stumberg, Rui Wang, and Daniel Cremers. 2020. D3vo: Deep depth, deep pose and deep uncertainty for monocular visual odometry. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1281--1292.Google ScholarGoogle ScholarCross RefCross Ref
  64. Xinyu Yi, Yuxiao Zhou, Marc Habermann, Soshi Shimada, Vladislav Golyanik, Christian Theobalt, and Feng Xu. 2022. Physical Inertial Poser (PIP): Physics-aware Real-time Human Motion Tracking from Sparse Inertial Sensors. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  65. Xinyu Yi, Yuxiao Zhou, and Feng Xu. 2021. TransPose: Real-time 3D Human Translation and Pose Estimation with Six Inertial Sensors. ACM Transactions on Graphics 40 (08 2021).Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Ye Yuan, Umar Iqbal, Pavlo Molchanov, Kris Kitani, and Jan Kautz. 2022. GLAMR: Global Occlusion-Aware Human Mesh Recovery with Dynamic Cameras. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  67. Ye Yuan and Kris Kitani. 2019. Ego-Pose Estimation and Forecasting As Real-Time PD Control. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV).Google ScholarGoogle Scholar
  68. Ye Yuan and Kris M. Kitani. 2018. 3D Ego-Pose Estimation via Imitation Learning. In ECCV.Google ScholarGoogle Scholar
  69. Yuxiang Zhang, Liang An, Tao Yu, Xiu Li, Kun Li, and Yebin Liu. 2020. 4D Association Graph for Realtime Multi-Person Motion Capture Using Multiple Video Cameras. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  70. Jon Zubizarreta, Iker Aguinaga, and Jose Maria Martinez Montiel. 2020. Direct sparse mapping. IEEE Transactions on Robotics 36, 4 (2020), 1363--1370.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. EgoLocate: Real-time Motion Capture, Localization, and Mapping with Sparse Body-mounted Sensors

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Graphics
      ACM Transactions on Graphics  Volume 42, Issue 4
      August 2023
      1912 pages
      ISSN:0730-0301
      EISSN:1557-7368
      DOI:10.1145/3609020
      Issue’s Table of Contents

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 July 2023
      Published in tog Volume 42, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader