Abstract
Fitness tracking devices have risen in popularity in recent years, but limitations in terms of their accuracy and failure to track many common exercises presents a need for improved fitness tracking solutions. This work proposes a multimodal deep learning approach to leverage multiple data sources for robust and accurate activity segmentation, exercise recognition and repetition counting. For this, we introduce the MM-Fit dataset; a substantial collection of inertial sensor data from smartphones, smartwatches and earbuds worn by participants while performing full-body workouts, and time-synchronised multi-viewpoint RGB-D video, with 2D and 3D pose estimates. We establish a strong baseline for activity segmentation and exercise recognition on the MM-Fit dataset, and demonstrate the effectiveness of our CNN-based architecture at extracting modality-specific spatial temporal features from inertial sensor and skeleton sequence data. We compare the performance of unimodal and multimodal models for activity recognition across a number of sensing devices and modalities. Furthermore, we demonstrate the effectiveness of multimodal deep learning at learning cross-modal representations for activity recognition, which achieves 96% accuracy across all sensing modalities on unseen subjects in the MM-Fit dataset; 94% using data from the smartwatch only; 85% from the smartphone only; and 82% on data from the earbud device. We strengthen single-device performance by using the zeroing-out training strategy, which phases out the other sensing modalities. Finally, we implement and evaluate a strong repetition counting baseline on our MM-Fit dataset. Collectively, these tasks contribute to recognising, segmenting and timing exercise and non-exercise activities for automatic exercise logging.
- Jake K. Aggarwal and Lu Xia. 2014. Human activity recognition from 3D data: A review. Pattern Recognition Letters 48 (2014), 70--80.Google ScholarCross Ref
- B Almaslukh. 2017. An Effective Deep Autoencoder Approach for Online Smartphone-Based Human Activity Recognition. International Journal of Computer Science and Network Security 17 (04 2017).Google Scholar
- Lei Bai, Lina Yao, Xianzhi Wang, Salil S Kanhere, Bin Guo, and Zhiwen Yu. 2020. Adversarial Multi-view Networks for Activity Recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 2 (2020), 1--22.Google ScholarDigital Library
- Oresti Banos, Juan-Manuel Galvez, Miguel Damas, Hector Pomares, and Ignacio Rojas. 2014. Window Size Impact in Human Activity Recognition. Sensors 14, 4 (2014), 6474--6499.Google ScholarCross Ref
- C. G. Bender, J. C. Hoffstot, B. T. Combs, S. Hooshangi, and J. Cappos. 2017. Measuring the fitness of fitness trackers. In 2017 IEEE Sensors Applications Symposium (SAS). 1--6.Google Scholar
- Andreas Bulling, Ulf Blanke, and Bernt Schiele. 2014. A tutorial on human activity recognition using body-worn inertial sensors. ACM Comput. Surv. 46 (2014), 33:1-33:33.Google Scholar
- Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2018. OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. In arXiv preprint arXiv:1812.08008.Google Scholar
- Cristian Sminchisescu Catalin Ionescu, Fuxin Li. 2011. Latent Structured Models for Human Pose Estimation. In International Conference on Computer Vision.Google Scholar
- Liming Chen, Jesse Hoey, Chris D. Nugent, Diane J. Cook, and Zhiwen Yu. 2012. Sensor-Based Activity Recognition. Trans. Sys. Man Cyber Part C 42, 6 (Nov. 2012), 790--808.Google Scholar
- K. S. Choi, Y. S. Joo, and S. Kim. 2013. Automatic exercise counter for outdoor exercise equipment. In 2013 IEEE International Conference on Consumer Electronics (ICCE). 436--437.Google Scholar
- Diane Cook, Kyle D Feuz, and Narayanan C Krishnan. 2013. Transfer learning for activity recognition: A survey. Knowledge and information systems 36, 3 (2013), 537--556.Google ScholarDigital Library
- Yong Du, Yun Fu, and Liang Wang. 2015. Skeleton based action recognition with convolutional neural network. 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015), 579--583.Google ScholarCross Ref
- Yong Du, Wei Wang, and Liang Wang. 2015. Hierarchical recurrent neural network for skeleton based action recognition. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015), 1110--1118.Google Scholar
- Georgios Dimitrios Evangelidis, Gurkirt Singh, and Radu Horaud. 2014. Skeletal Quads: Human Action Recognition Using Joint Quadruples. 2014 22nd International Conference on Pattern Recognition (2014), 4513--4518.Google Scholar
- Hristijan Gjoreski, Jani Bizjak, Martin Gjoreski, and Matjaz Gams. 2016. Comparing Deep and Classical Machine Learning Methods for Human Activity Recognition using Wrist Accelerometer.Google Scholar
- Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011. Deep Sparse Rectifier Neural Networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research), Geoffrey Gordon, David Dunson, and Miroslav Dudík (Eds.), Vol. 15. PMLR, Fort Lauderdale, FL, USA, 315--323.Google Scholar
- Xiaonan Guo, Jian Liu, Cong Shi, Hongbo Liu, Yingying Chen, and Mooi Choo Chuah. 2018. Device-free personalized fitness assistant using WiFi. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 4 (2018), 1--23.Google ScholarDigital Library
- S. Ha, J. Yun, and S. Choi. 2015. Multi-modal Convolutional Neural Networks for Activity Recognition. In 2015 IEEE International Conference on Systems, Man, and Cybernetics. 3017--3022.Google Scholar
- Nils Y. Hammerla, Shane Halloran, and Thomas Plötz. 2016. Deep, Convolutional, and Recurrent Models for Human Activity Recognition using Wearables. In IJCAI.Google Scholar
- Keng hao Chang, Mike Y. Chen, and John F. Canny. 2007. Tracking Free-Weight Exercises. In UbiComp.Google Scholar
- Samitha Herath, Mehrtash Tafazzoli Harandi, and Fatih Murat Porikli. 2017. Going Deeper into Action Recognition: A Survey. Image Vision Comput. 60 (2017), 4--21.Google ScholarDigital Library
- HM Sajjad Hossain, MD Abdullah Al Haiz Khan, and Nirmalya Roy. 2018. DeActive: scaling activity recognition with active deep learning. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 2 (2018), 1--23.Google ScholarDigital Library
- Inhwan Hwang, Geonho Cha, and Songhwai Oh. 2017. Multi-modal human action recognition using deep neural networks fusing image and inertial sensor data. In 2017 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI). IEEE, 278--283.Google ScholarCross Ref
- Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2014. Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 7 (jul 2014), 1325--1339.Google ScholarDigital Library
- Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Ahmed Sohel, and Farid Boussaïd. 2017. A New Representation of Skeleton Sequences for 3D Action Recognition. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), 4570--4579.Google ScholarCross Ref
- Rushil Khurana, Karan Ahuja, Zac Yu, Jennifer Mankoff, Chris Harrison, and Mayank Goel. 2018. GymCam: Detecting, Recognizing and Tracking Simultaneous Exercises in Unconstrained Scenes. IMWUT 2 (2018), 185:1-185:17.Google Scholar
- Diederick P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR).Google Scholar
- Oscar D. Lara and Miguel A. Labrador. 2013. A Survey on Human Activity Recognition using Wearable Sensors. IEEE Communications Surveys & Tutorials 15 (2013), 1192--1209.Google ScholarCross Ref
- Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (27 5 2015), 436--444.Google Scholar
- Yann Lecun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. In Proceedings of the IEEE. 2278--2324.Google ScholarCross Ref
- O. Levy and L. Wolf. 2015. Live Repetition Counting. In 2015 IEEE International Conference on Computer Vision (ICCV). 3020--3028.Google Scholar
- Chao Li, Qiaoyong Zhong, Di Xie, and Shiliang Pu. 2017. Skeleton-based action recognition with convolutional neural networks. 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW) (2017), 597--600.Google Scholar
- Maosen Li, Siheng Chen, Xu Chen, Ya Zhang, Yanfeng Wang, and Qi Tian. 2019. Actional-Structural Graph Convolutional Networks for Skeleton-based Action Recognition. In CVPR.Google Scholar
- Shanhong Liu. 2018. Fitness & Activity Tracker. https://www.statista.com/study/35598/fitness-and-activity-tracker/.Google Scholar
- Julieta Martinez, Rayat Hossain, Javier Romero, and James J. Little. 2017. A simple yet effective baseline for 3d human pose estimation. In ICCV.Google Scholar
- Dan Morris, T. Scott Saponas, Andrew Guillory, and Ilya Kelner. 2014. RecoFit: using a wearable sensor to find, recognize, and count repetitive exercises. In CHI.Google Scholar
- Bobak Mortazavi, Mohammad Pourhomayoun, Gabriel Alsheikh, Nabil Alshurafa, Sunghoon Ivan Lee, and Majid Sarrafzadeh. 2014. Determining the Single Best Axis for Exercise Repetition Recognition and Counting on SmartWatches. 2014 11th International Conference on Wearable and Implantable Body Sensor Networks (2014), 33--38.Google Scholar
- Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y. Ng. 2011. Multimodal Deep Learning. In ICML.Google ScholarDigital Library
- Ronald Poppe. 2010. A survey on vision-based human action recognition. Image Vision Comput. 28 (2010), 976--990.Google ScholarDigital Library
- Valentin Radu, Nicholas D. Lane, Sourav Bhattacharya, Cecilia Mascolo, Mahesh K. Marina, and Fahim Kawsar. 2016. Towards Multimodal Deep Learning for Activity Recognition on Mobile Devices. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct (UbiComp '16). ACM, New York, NY, USA, 185--188.Google ScholarDigital Library
- Valentin Radu, Catherine Tong, Sourav Bhattacharya, Nicholas D. Lane, Cecilia Mascolo, Mahesh K. Marina, and Fahim Kawsar. 2018. Multimodal Deep Learning for Activity and Context Recognition. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1, 4, Article 157 (Jan. 2018), 27 pages.Google ScholarDigital Library
- Iasonas Kokkinos Riza Alp Guler, Natalia Neverova. 2018. DensePose: Dense Human Pose Estimation In The Wild. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
- T. F. H. Runia, C. G. M. Snoek, and A. W. M. Smeulders. 2018. Real-World Repetition Estimation by Div, Grad and Curl. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9009--9017.Google ScholarCross Ref
- Aaqib Saeed, Tanir Ozcelebi, and Johan Lukkien. 2018. Synthesizing and reconstructing missing sensory modalities in behavioral context recognition. Sensors 18, 9 (2018), 2967.Google ScholarCross Ref
- Abraham. Savitzky and M. J. E. Golay. 1964. Smoothing and Differentiation of Data by Simplified Least Squares Procedures. Analytical Chemistry 36, 8 (1964), 1627--1639.Google ScholarCross Ref
- Andrea Soro, Gino Brunner, Simon Tanner, and Roger Wattenhofer. 2019. Recognition and Repetition Counting for Complex Physical Exercises with Deep Learning. Sensors 19, 3 (2019).Google Scholar
- Allan Stisen, Henrik Blunck, Sourav Bhattacharya, Thor Siiger Prentow, Mikkel Baun Kjærgaard, Anind Dey, Tobias Sonne, and Mads Møller Jensen. 2015. Smart Devices Are Different: Assessing and Mitigating Mobile Sensing Heterogeneities for Activity Recognition. In Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems (SenSys '15). ACM, New York, NY, USA, 127--140.Google ScholarDigital Library
- Terry Taewoong Um, Vahid Babakeshizadeh, and Dana Kulic. 2016. Exercise motion classification from large-scale wearable sensor data using convolutional neural networks. 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2016), 2385--2390.Google Scholar
- Eduardo Velloso, Andreas Bulling, Hans-Werner Gellersen, Wallace Ugulino, and Hugo Fuks. 2013. Qualitative activity recognition of weight lifting exercises. In AH.Google Scholar
- Raviteja Vemulapalli, Felipe Arrate, and Rama Chellappa. 2014. Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group. 2014 IEEE Conference on Computer Vision and Pattern Recognition (2014), 588--595.Google ScholarDigital Library
- Jindong Wang, Yiqiang Chen, Shuji Hao, Xiaohui Peng, and Lisha Hu. 2019. Deep learning for sensor-based activity recognition: A survey. Pattern Recognition Letters 119 (2019), 3--11.Google ScholarDigital Library
- Jiang Wang, Zicheng Liu, Ying Wu, and Junsong Yuan. 2014. Learning Actionlet Ensemble for 3D Human Action Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 36 (2014), 914--927.Google ScholarCross Ref
- Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition. In AAAI.Google Scholar
- Jianbo Yang, Minh Nhut Nguyen, Phyo Phyo San, Xiaoli Li, and Shonali Krishnaswamy. 2015. Deep Convolutional Neural Networks on Multichannel Time Series for Human Activity Recognition. In IJCAI.Google ScholarDigital Library
- Ming Zeng, Le T. Nguyen, Bo Yu, Ole J. Mengshoel, Jiang Zhu, Pang Wu, and Joy Zhang. 2014. Convolutional Neural Networks for human activity recognition using mobile sensors. 6th International Conference on Mobile Computing, Applications and Services (2014), 197--205.Google ScholarCross Ref
Index Terms
- MM-Fit: Multimodal Deep Learning for Automatic Exercise Logging across Sensing Devices
Recommendations
Swimming style recognition and lap counting using a smartwatch and deep learning
ISWC '19: Proceedings of the 2019 ACM International Symposium on Wearable ComputersHuman activity recognition from raw sensor data has enabled modern wearable devices to track and analyze everyday activities. However, when used in real world conditions, the performance of off-the-shelf devices is often insufficient. This paper tackles ...
Smartphone-based monitoring system for activities of daily living for elderly people and their relatives etc.
UbiComp '13 Adjunct: Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publicationWe developed a smartphone-based monitoring system to allay the anxiety of elderly people and that of their relatives, friends and caregivers by unobtrusively monitoring an elderly person's activities of daily living. A smartphone of the elderly person ...
Situating Wearables: Smartwatch Use in Context
CHI '17: Proceedings of the 2017 CHI Conference on Human Factors in Computing SystemsDrawing on 168 hours of video recordings of smartwatch use, this paper studies how context influences smartwatch use. We explore the effects of the presence of others, activity, location and time of day on 1,009 instances of use. Watch interaction is ...
Comments