skip to main content
research-article

MM-Fit: Multimodal Deep Learning for Automatic Exercise Logging across Sensing Devices

Published:18 December 2020Publication History
Skip Abstract Section

Abstract

Fitness tracking devices have risen in popularity in recent years, but limitations in terms of their accuracy and failure to track many common exercises presents a need for improved fitness tracking solutions. This work proposes a multimodal deep learning approach to leverage multiple data sources for robust and accurate activity segmentation, exercise recognition and repetition counting. For this, we introduce the MM-Fit dataset; a substantial collection of inertial sensor data from smartphones, smartwatches and earbuds worn by participants while performing full-body workouts, and time-synchronised multi-viewpoint RGB-D video, with 2D and 3D pose estimates. We establish a strong baseline for activity segmentation and exercise recognition on the MM-Fit dataset, and demonstrate the effectiveness of our CNN-based architecture at extracting modality-specific spatial temporal features from inertial sensor and skeleton sequence data. We compare the performance of unimodal and multimodal models for activity recognition across a number of sensing devices and modalities. Furthermore, we demonstrate the effectiveness of multimodal deep learning at learning cross-modal representations for activity recognition, which achieves 96% accuracy across all sensing modalities on unseen subjects in the MM-Fit dataset; 94% using data from the smartwatch only; 85% from the smartphone only; and 82% on data from the earbud device. We strengthen single-device performance by using the zeroing-out training strategy, which phases out the other sensing modalities. Finally, we implement and evaluate a strong repetition counting baseline on our MM-Fit dataset. Collectively, these tasks contribute to recognising, segmenting and timing exercise and non-exercise activities for automatic exercise logging.

References

  1. Jake K. Aggarwal and Lu Xia. 2014. Human activity recognition from 3D data: A review. Pattern Recognition Letters 48 (2014), 70--80.Google ScholarGoogle ScholarCross RefCross Ref
  2. B Almaslukh. 2017. An Effective Deep Autoencoder Approach for Online Smartphone-Based Human Activity Recognition. International Journal of Computer Science and Network Security 17 (04 2017).Google ScholarGoogle Scholar
  3. Lei Bai, Lina Yao, Xianzhi Wang, Salil S Kanhere, Bin Guo, and Zhiwen Yu. 2020. Adversarial Multi-view Networks for Activity Recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 2 (2020), 1--22.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Oresti Banos, Juan-Manuel Galvez, Miguel Damas, Hector Pomares, and Ignacio Rojas. 2014. Window Size Impact in Human Activity Recognition. Sensors 14, 4 (2014), 6474--6499.Google ScholarGoogle ScholarCross RefCross Ref
  5. C. G. Bender, J. C. Hoffstot, B. T. Combs, S. Hooshangi, and J. Cappos. 2017. Measuring the fitness of fitness trackers. In 2017 IEEE Sensors Applications Symposium (SAS). 1--6.Google ScholarGoogle Scholar
  6. Andreas Bulling, Ulf Blanke, and Bernt Schiele. 2014. A tutorial on human activity recognition using body-worn inertial sensors. ACM Comput. Surv. 46 (2014), 33:1-33:33.Google ScholarGoogle Scholar
  7. Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2018. OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. In arXiv preprint arXiv:1812.08008.Google ScholarGoogle Scholar
  8. Cristian Sminchisescu Catalin Ionescu, Fuxin Li. 2011. Latent Structured Models for Human Pose Estimation. In International Conference on Computer Vision.Google ScholarGoogle Scholar
  9. Liming Chen, Jesse Hoey, Chris D. Nugent, Diane J. Cook, and Zhiwen Yu. 2012. Sensor-Based Activity Recognition. Trans. Sys. Man Cyber Part C 42, 6 (Nov. 2012), 790--808.Google ScholarGoogle Scholar
  10. K. S. Choi, Y. S. Joo, and S. Kim. 2013. Automatic exercise counter for outdoor exercise equipment. In 2013 IEEE International Conference on Consumer Electronics (ICCE). 436--437.Google ScholarGoogle Scholar
  11. Diane Cook, Kyle D Feuz, and Narayanan C Krishnan. 2013. Transfer learning for activity recognition: A survey. Knowledge and information systems 36, 3 (2013), 537--556.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Yong Du, Yun Fu, and Liang Wang. 2015. Skeleton based action recognition with convolutional neural network. 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015), 579--583.Google ScholarGoogle ScholarCross RefCross Ref
  13. Yong Du, Wei Wang, and Liang Wang. 2015. Hierarchical recurrent neural network for skeleton based action recognition. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015), 1110--1118.Google ScholarGoogle Scholar
  14. Georgios Dimitrios Evangelidis, Gurkirt Singh, and Radu Horaud. 2014. Skeletal Quads: Human Action Recognition Using Joint Quadruples. 2014 22nd International Conference on Pattern Recognition (2014), 4513--4518.Google ScholarGoogle Scholar
  15. Hristijan Gjoreski, Jani Bizjak, Martin Gjoreski, and Matjaz Gams. 2016. Comparing Deep and Classical Machine Learning Methods for Human Activity Recognition using Wrist Accelerometer.Google ScholarGoogle Scholar
  16. Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011. Deep Sparse Rectifier Neural Networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research), Geoffrey Gordon, David Dunson, and Miroslav Dudík (Eds.), Vol. 15. PMLR, Fort Lauderdale, FL, USA, 315--323.Google ScholarGoogle Scholar
  17. Xiaonan Guo, Jian Liu, Cong Shi, Hongbo Liu, Yingying Chen, and Mooi Choo Chuah. 2018. Device-free personalized fitness assistant using WiFi. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 4 (2018), 1--23.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Ha, J. Yun, and S. Choi. 2015. Multi-modal Convolutional Neural Networks for Activity Recognition. In 2015 IEEE International Conference on Systems, Man, and Cybernetics. 3017--3022.Google ScholarGoogle Scholar
  19. Nils Y. Hammerla, Shane Halloran, and Thomas Plötz. 2016. Deep, Convolutional, and Recurrent Models for Human Activity Recognition using Wearables. In IJCAI.Google ScholarGoogle Scholar
  20. Keng hao Chang, Mike Y. Chen, and John F. Canny. 2007. Tracking Free-Weight Exercises. In UbiComp.Google ScholarGoogle Scholar
  21. Samitha Herath, Mehrtash Tafazzoli Harandi, and Fatih Murat Porikli. 2017. Going Deeper into Action Recognition: A Survey. Image Vision Comput. 60 (2017), 4--21.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. HM Sajjad Hossain, MD Abdullah Al Haiz Khan, and Nirmalya Roy. 2018. DeActive: scaling activity recognition with active deep learning. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 2 (2018), 1--23.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Inhwan Hwang, Geonho Cha, and Songhwai Oh. 2017. Multi-modal human action recognition using deep neural networks fusing image and inertial sensor data. In 2017 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI). IEEE, 278--283.Google ScholarGoogle ScholarCross RefCross Ref
  24. Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2014. Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 7 (jul 2014), 1325--1339.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Ahmed Sohel, and Farid Boussaïd. 2017. A New Representation of Skeleton Sequences for 3D Action Recognition. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), 4570--4579.Google ScholarGoogle ScholarCross RefCross Ref
  26. Rushil Khurana, Karan Ahuja, Zac Yu, Jennifer Mankoff, Chris Harrison, and Mayank Goel. 2018. GymCam: Detecting, Recognizing and Tracking Simultaneous Exercises in Unconstrained Scenes. IMWUT 2 (2018), 185:1-185:17.Google ScholarGoogle Scholar
  27. Diederick P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  28. Oscar D. Lara and Miguel A. Labrador. 2013. A Survey on Human Activity Recognition using Wearable Sensors. IEEE Communications Surveys & Tutorials 15 (2013), 1192--1209.Google ScholarGoogle ScholarCross RefCross Ref
  29. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (27 5 2015), 436--444.Google ScholarGoogle Scholar
  30. Yann Lecun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. In Proceedings of the IEEE. 2278--2324.Google ScholarGoogle ScholarCross RefCross Ref
  31. O. Levy and L. Wolf. 2015. Live Repetition Counting. In 2015 IEEE International Conference on Computer Vision (ICCV). 3020--3028.Google ScholarGoogle Scholar
  32. Chao Li, Qiaoyong Zhong, Di Xie, and Shiliang Pu. 2017. Skeleton-based action recognition with convolutional neural networks. 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW) (2017), 597--600.Google ScholarGoogle Scholar
  33. Maosen Li, Siheng Chen, Xu Chen, Ya Zhang, Yanfeng Wang, and Qi Tian. 2019. Actional-Structural Graph Convolutional Networks for Skeleton-based Action Recognition. In CVPR.Google ScholarGoogle Scholar
  34. Shanhong Liu. 2018. Fitness & Activity Tracker. https://www.statista.com/study/35598/fitness-and-activity-tracker/.Google ScholarGoogle Scholar
  35. Julieta Martinez, Rayat Hossain, Javier Romero, and James J. Little. 2017. A simple yet effective baseline for 3d human pose estimation. In ICCV.Google ScholarGoogle Scholar
  36. Dan Morris, T. Scott Saponas, Andrew Guillory, and Ilya Kelner. 2014. RecoFit: using a wearable sensor to find, recognize, and count repetitive exercises. In CHI.Google ScholarGoogle Scholar
  37. Bobak Mortazavi, Mohammad Pourhomayoun, Gabriel Alsheikh, Nabil Alshurafa, Sunghoon Ivan Lee, and Majid Sarrafzadeh. 2014. Determining the Single Best Axis for Exercise Repetition Recognition and Counting on SmartWatches. 2014 11th International Conference on Wearable and Implantable Body Sensor Networks (2014), 33--38.Google ScholarGoogle Scholar
  38. Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y. Ng. 2011. Multimodal Deep Learning. In ICML.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Ronald Poppe. 2010. A survey on vision-based human action recognition. Image Vision Comput. 28 (2010), 976--990.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Valentin Radu, Nicholas D. Lane, Sourav Bhattacharya, Cecilia Mascolo, Mahesh K. Marina, and Fahim Kawsar. 2016. Towards Multimodal Deep Learning for Activity Recognition on Mobile Devices. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct (UbiComp '16). ACM, New York, NY, USA, 185--188.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Valentin Radu, Catherine Tong, Sourav Bhattacharya, Nicholas D. Lane, Cecilia Mascolo, Mahesh K. Marina, and Fahim Kawsar. 2018. Multimodal Deep Learning for Activity and Context Recognition. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1, 4, Article 157 (Jan. 2018), 27 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Iasonas Kokkinos Riza Alp Guler, Natalia Neverova. 2018. DensePose: Dense Human Pose Estimation In The Wild. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  43. T. F. H. Runia, C. G. M. Snoek, and A. W. M. Smeulders. 2018. Real-World Repetition Estimation by Div, Grad and Curl. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9009--9017.Google ScholarGoogle ScholarCross RefCross Ref
  44. Aaqib Saeed, Tanir Ozcelebi, and Johan Lukkien. 2018. Synthesizing and reconstructing missing sensory modalities in behavioral context recognition. Sensors 18, 9 (2018), 2967.Google ScholarGoogle ScholarCross RefCross Ref
  45. Abraham. Savitzky and M. J. E. Golay. 1964. Smoothing and Differentiation of Data by Simplified Least Squares Procedures. Analytical Chemistry 36, 8 (1964), 1627--1639.Google ScholarGoogle ScholarCross RefCross Ref
  46. Andrea Soro, Gino Brunner, Simon Tanner, and Roger Wattenhofer. 2019. Recognition and Repetition Counting for Complex Physical Exercises with Deep Learning. Sensors 19, 3 (2019).Google ScholarGoogle Scholar
  47. Allan Stisen, Henrik Blunck, Sourav Bhattacharya, Thor Siiger Prentow, Mikkel Baun Kjærgaard, Anind Dey, Tobias Sonne, and Mads Møller Jensen. 2015. Smart Devices Are Different: Assessing and Mitigating Mobile Sensing Heterogeneities for Activity Recognition. In Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems (SenSys '15). ACM, New York, NY, USA, 127--140.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Terry Taewoong Um, Vahid Babakeshizadeh, and Dana Kulic. 2016. Exercise motion classification from large-scale wearable sensor data using convolutional neural networks. 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2016), 2385--2390.Google ScholarGoogle Scholar
  49. Eduardo Velloso, Andreas Bulling, Hans-Werner Gellersen, Wallace Ugulino, and Hugo Fuks. 2013. Qualitative activity recognition of weight lifting exercises. In AH.Google ScholarGoogle Scholar
  50. Raviteja Vemulapalli, Felipe Arrate, and Rama Chellappa. 2014. Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group. 2014 IEEE Conference on Computer Vision and Pattern Recognition (2014), 588--595.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Jindong Wang, Yiqiang Chen, Shuji Hao, Xiaohui Peng, and Lisha Hu. 2019. Deep learning for sensor-based activity recognition: A survey. Pattern Recognition Letters 119 (2019), 3--11.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Jiang Wang, Zicheng Liu, Ying Wu, and Junsong Yuan. 2014. Learning Actionlet Ensemble for 3D Human Action Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 36 (2014), 914--927.Google ScholarGoogle ScholarCross RefCross Ref
  53. Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition. In AAAI.Google ScholarGoogle Scholar
  54. Jianbo Yang, Minh Nhut Nguyen, Phyo Phyo San, Xiaoli Li, and Shonali Krishnaswamy. 2015. Deep Convolutional Neural Networks on Multichannel Time Series for Human Activity Recognition. In IJCAI.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Ming Zeng, Le T. Nguyen, Bo Yu, Ole J. Mengshoel, Jiang Zhu, Pang Wu, and Joy Zhang. 2014. Convolutional Neural Networks for human activity recognition using mobile sensors. 6th International Conference on Mobile Computing, Applications and Services (2014), 197--205.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. MM-Fit: Multimodal Deep Learning for Automatic Exercise Logging across Sensing Devices

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
            Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies  Volume 4, Issue 4
            December 2020
            1356 pages
            EISSN:2474-9567
            DOI:10.1145/3444864
            Issue’s Table of Contents

            Copyright © 2020 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 18 December 2020
            Published in imwut Volume 4, Issue 4

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader