research-article

Open Access

VAX: Using Existing Video and Audio-based Activity Recognition Models to Bootstrap Privacy-Sensitive Sensors

Authors:
Prasoon Patidar

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA

0000-0003-0034-2767
View Profile

,
Mayank Goel

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA

0000-0003-1237-7545
View Profile

,
Yuvraj Agarwal

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA

0000-0001-9304-6080
View Profile

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies Volume 7 Issue 3Article No.: 117pp 1–24https://doi.org/10.1145/3610907

Published:27 September 2023Publication History

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Abstract

The use of audio and video modalities for Human Activity Recognition (HAR) is common, given the richness of the data and the availability of pre-trained ML models using a large corpus of labeled training data. However, audio and video sensors also lead to significant consumer privacy concerns. Researchers have thus explored alternate modalities that are less privacy-invasive such as mmWave doppler radars, IMUs, motion sensors. However, the key limitation of these approaches is that most of them do not readily generalize across environments and require significant in-situ training data. Recent work has proposed cross-modality transfer learning approaches to alleviate the lack of trained labeled data with some success. In this paper, we generalize this concept to create a novel system called VAX (Video/Audio to 'X'), where training labels acquired from existing Video/Audio ML models are used to train ML models for a wide range of 'X' privacy-sensitive sensors. Notably, in VAX, once the ML models for the privacy-sensitive sensors are trained, with little to no user involvement, the Audio/Video sensors can be removed altogether to protect the user's privacy better. We built and deployed VAX in ten participants' homes while they performed 17 common activities of daily living. Our evaluation results show that after training, VAX can use its onboard camera and microphone to detect approximately 15 out of 17 activities with an average accuracy of 90%. For these activities that can be detected using a camera and a microphone, VAX trains a per-home model for the privacy-preserving sensors. These models (average accuracy = 84%) require no in-situ user input. In addition, when VAX is augmented with just one labeled instance for the activities not detected by the VAX A/V pipeline (~2 out of 17), it can detect all 17 activities with an average accuracy of 84%. Our results show that VAX is significantly better than a baseline supervised-learning approach of using one labeled instance per activity in each home (average accuracy of 79%) since VAX reduces the user burden of providing activity labels by 8x (~2 labels vs. 17 labels).

Supplemental Material

Available for Download

zip

patidar.zip (278.3 KB)

Supplemental movie, appendix, image and software files for, VAX: Using Existing Video and Audio-based Activity Recognition Models to Bootstrap Privacy-Sensitive Sensors

References

Noura Abdi, Kopo M. Ramokapane, and Jose M. Such. 2019. More than Smart Speakers: Security and Privacy Perceptions of Smart Home Personal Assistants. In Fifteenth Symposium on Usable Privacy and Security (SOUPS 2019). USENIX Association, Santa Clara, CA, 451--466. https://www.usenix.org/conference/soups2019/presentation/abdiGoogle Scholar
Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. 2016. YouTube-8M: A Large-Scale Video Classification Benchmark. arXiv:1609.08675 [cs.CV]Google Scholar
Matheus Gabriel Acorsi, Leandro Maria Gimenez, and Maurício Martello. 2020. Assessing the performance of a low-cost thermal camera in proximal and aerial conditions. Remote Sensing 12, 21 (2020), 3591.Google ScholarCross Ref
Antonio A Aguileta, Ramon F Brena, Oscar Mayora, Erik Molino-Minero-Re, and Luis A Trejo. 2019. Multi-sensor fusion for activity recognition---A survey. Sensors 19, 17 (2019), 3808.Google ScholarCross Ref
Karan Ahuja, Yue Jiang, Mayank Goel, and Chris Harrison. 2021. Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity Recognition. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI '21). Association for Computing Machinery, New York, NY, USA, Article 292, 10 pages. https://doi.org/10.1145/3411764.3445138Google ScholarDigital Library
Reed Albergotti. 2019. How Nest, designed to keep intruders out of people's homes, effectively allowed hackers to get in, researchers claim. https://www.washingtonpost.com/technology/2019/04/23/how-nest-designed-keep-intruders-out-peoples-homes-effectively-allowed-hackers-get/?noredirect=on.Google Scholar
India Ashok. 2016. Hackers leave Finnish residents cold after DDoS attack knocks out heating systems. https://www.ibtimes.co.uk/hackers-leave-finnish-residents-cold-after-ddos-attack-knocks-out-heating-systems-1590639.Google Scholar
Yusuf Aytar, Carl Vondrick, and Antonio Torralba. 2016. SoundNet: Learning Sound Representations from Unlabeled Video. In Proceedings of the 30th International Conference on Neural Information Processing Systems (Barcelona, Spain) (NIPS'16). Curran Associates Inc., Red Hook, NY, USA, 892--900.Google Scholar
Bharathan Balaji, Jason Koh, Nadir Weibel, and Yuvraj Agarwal. 2016. Genie: A Longitudinal Study Comparing Physical and Software Thermostats in Office Buildings. In Proc. of the 2016 ACM Internat. Joint Conference on Pervasive and Ubiquitous Computing (Heidelberg, Germany) (UbiComp '16). ACM, New York, NY, USA, 1200--1211. https://doi.org/10.1145/2971648.2971719Google ScholarDigital Library
Alex Beltran, Varick L. Erickson, and Alberto E. Cerpa. 2013. ThermoSense: Occupancy Thermal Based Sensing for HVAC Control. In Proc. of the 5th ACM Workshop on Embedded Systems For Energy-Efficient Buildings (Roma, Italy) (BuildSys'13). ACM, New York, NY, USA, 1--8. https://doi.org/10.1145/2528282.2528301Google ScholarDigital Library
Gedas Bertasius, Heng Wang, and Lorenzo Torresani. 2021. Is Space-Time Attention All You Need for Video Understanding? arXiv:2102.05095 [cs.CV]Google Scholar
Sejal Bhalla, Mayank Goel, and Rushil Khurana. 2021. IMU2Doppler: Cross-Modal Domain Adaptation for Doppler-based Activity Recognition Using IMU Data. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 4 (2021), 1--20.Google ScholarDigital Library
Sudershan Boovaraghavan, Chen Chen, Anurag Maravi, Mike Czapik, Yang Zhang, Chris Harrison, and Yuvraj Agarwal. 2023. Mites: Design and Deployment of a General-Purpose Sensing Infrastructure for Buildings. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 7, 1, Article 2 (mar 2023), 32 pages. https://doi.org/10.1145/3580865Google ScholarDigital Library
Bosch. 2022. Cross Domain Development Kit | XDK. https://www.bosch-connectivity.com/media/downloads/xdk/xdk_node_110_combined_datasheet.pdf.Google Scholar
Hong Cai, Belal Korany, Chitra R Karanam, and Yasamin Mostofi. 2020. Teaching rf to sense without rf training measurements. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 4 (2020), 1--22.Google ScholarDigital Library
Kelly E. Caine, Arthur D. Fisk, and Wendy A. Rogers. 2006. Benefits and Privacy Concerns of a Home Equipped with a Visual Sensing System: A Perspective from Older Adults. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 50, 2 (2006), 180--184. https://doi.org/10.1177/154193120605000203 arXiv:https://doi.org/10.1177/154193120605000203Google ScholarCross Ref
Timothy I Cannings, Yingying Fan, and Richard J Samworth. 2020. Classification with imperfect training labels. Biometrika 107, 2 (2020), 311--330.Google ScholarCross Ref
Song Cao and Ram Nevatia. 2016. Exploring deep learning based solutions in fine grained activity recognition in the wild. In 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, Cancun, Mexico, 384--389. https://doi.org/10.1109/ICPR.2016.7899664Google ScholarCross Ref
João Carreira and Andrew Zisserman. 2017. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017. IEEE Computer Society, Honolulu, HI, USA, 4724--4733. https://doi.org/10.1109/CVPR.2017.502Google ScholarCross Ref
Youngjae Chang, Akhil Mathur, Anton Isopoussu, Junehwa Song, and Fahim Kawsar. 2020. A systematic study of unsupervised domain adaptation for robust human-activity recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 1 (2020), 1--30.Google ScholarDigital Library
Youngjae Chang, Akhil Mathur, Anton Isopoussu, Junehwa Song, and Fahim Kawsar. 2020. A systematic study of unsupervised domain adaptation for robust human-activity recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 1 (2020), 1--30.Google ScholarDigital Library
Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321--357.Google ScholarCross Ref
Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, Zheng Zhang, Dazhi Cheng, Chenchen Zhu, Tianheng Cheng, Qijie Zhao, Buyu Li, Xin Lu, Rui Zhu, Yue Wu, Jifeng Dai, Jingdong Wang, Jianping Shi, Wanli Ouyang, Chen Change Loy, and Dahua Lin. 2019. MMDetection: Open MMLab Detection Toolbox and Benchmark. arXiv:1906.07155 [cs.CV]Google Scholar
Qingchao Chen, Bo Tan, Kevin Chetty, and Karl Woodbridge. 2016. Activity recognition based on micro-Doppler signature with in-home Wi-Fi. In 2016 IEEE 18th International Conference on e-Health Networking, Applications and Services (Healthcom). IEEE, Munich, Germany, 1--6. https://doi.org/10.1109/HealthCom.2016.7749457Google ScholarCross Ref
Wenqiang Chen, Shupei Lin, Elizabeth Thompson, and John Stankovic. 2021. Sensecollect: We need efficient ways to collect on-body sensor-based human activity data! Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 3 (2021), 1--27.Google ScholarDigital Library
Shohreh Deldari, Hao Xue, Aaqib Saeed, Jiayuan He, Daniel V. Smith, and Flora D. Salim. 2022. Beyond Just Vision: A Review on Self-Supervised Representation Learning on Multimodal and Temporal Data. arXiv:2206.02353 [cs.LG]Google Scholar
Shohreh Deldari, Hao Xue, Aaqib Saeed, Daniel V Smith, and Flora D Salim. 2022. COCOA: Cross Modality Contrastive Learning for Sensor Data. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 3 (2022), 1--28.Google ScholarDigital Library
Florenc Demrozi, Graziano Pravadelli, Azra Bihorac, and Parisa Rashidi. 2020. Human Activity Recognition Using Inertial, Physiological and Environmental Sensors: A Comprehensive Survey. IEEE Access 8 (2020), 210816--210836. https://doi.org/10.1109/ACCESS.2020. 3037715Google ScholarCross Ref
Konstantinos Drossos, Stylianos I. Mimilakis, Shayan Gharib, Yanxiong Li, and Tuomas Virtanen. 2020. Sound Event Detection with Depthwise Separable and Dilated Convolutions. In 2020 International Joint Conference on Neural Networks (IJCNN). IJCNN, Glasgow, UK, 1--7. https://doi.org/10.1109/IJCNN48605.2020.9207532Google ScholarCross Ref
Haodong Duan, Jiaqi Wang, Kai Chen, and Dahua Lin. 2022. PYSKL: Towards Good Practices for Skeleton Action Recognition. In Proceedings of the 30th ACM International Conference on Multimedia (Lisboa, Portugal) (MM '22). Association for Computing Machinery, New York, NY, USA, 7351--7354. https://doi.org/10.1145/3503161.3548546Google ScholarDigital Library
Haodong Duan, Yue Zhao, Kai Chen, Dahua Lin, and Bo Dai. 2022. Revisiting Skeleton-based Action Recognition. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, New Orleans, LA, USA, 2959--2968. https://doi.org/10.1109/ CVPR52688.2022.00298Google Scholar
Pardis Emami-Naeini, Janarth Dheenadhayalan, Yuvraj Agarwal, and Lorrie Faith Cranor. 2021. Which Privacy and Security Attributes Most Impact Consumers' Risk Perception and Willingness to Purchase IoT Devices?. In 2021 IEEE Symposium on Security and Privacy (SP). IEEE, San Francisco, CA, USA, 519--536. https://doi.org/10.1109/SP40001.2021.00112Google ScholarCross Ref
Pardis Emami-Naeini, Henry Dixon, Yuvraj Agarwal, and Lorrie Faith Cranor. 2019. Exploring How Privacy and Security Factor into IoT Device Purchase Behavior. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI '19). Association for Computing Machinery, New York, NY, USA, 1--12. https://doi.org/10.1145/3290605.3300764Google ScholarDigital Library
Baris Erol, Sevgi Z. Gurbuz, and Moeness G. Amin. 2019. GAN-based Synthetic Radar Micro-Doppler Augmentations for Improved Human Activity Recognition. In 2019 IEEE Radar Conference (RadarConf). IEEE, Boston, MA, USA, 1--5. https://doi.org/10.1109/RADAR.2019.8835589Google ScholarCross Ref
Christoph Feichtenhofer. 2020. X3D: Expanding Architectures for Efficient Video Recognition. arXiv:2004.04730 [cs.CV]Google Scholar
Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. 2019. SlowFast Networks for Video Recognition. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Seoul, Korea (South), 6201--6210. https://doi.org/10.1109/ICCV.2019.00630Google ScholarCross Ref
Jort F. Gemmeke, Daniel P. W. Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R. Channing Moore, Manoj Plakal, and Marvin Ritter. 2017. Audio Set: An ontology and human-labeled dataset for audio events. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, New Orleans, LA, USA, 776--780. https://doi.org/10.1109/ICASSP.2017.7952261Google ScholarDigital Library
Deepti Ghadiyaram, Du Tran, and Dhruv Mahajan. 2019. Large-Scale Weakly-Supervised Pre-Training for Video Action Recognition. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Long Beach, CA, USA, 12038--12047. https://doi.org/10.1109/CVPR.2019.01232Google ScholarCross Ref
Emily Green. 2018. Hacker terrorizes family by hijacking baby monitor. https://nordvpn.com/blog/baby-monitor-iot-hacking/.Google Scholar
Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, and Jitendra Malik. 2018. AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT, USA, 6047--6056. https://doi.org/10.1109/CVPR.2018.00633Google ScholarCross Ref
Harish Haresamudram, Irfan Essa, and Thomas Plötz. 2023. Investigating Enhancements to Contrastive Predictive Coding for Human Activity Recognition. In 2023 IEEE International Conference on Pervasive Computing and Communications (PerCom). IEEE, Atlanta, GA, USA, 232--241. https://doi.org/10.1109/PERCOM56429.2023.10099197Google ScholarCross Ref
Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem, and Juan Carlos Niebles. 2015. ActivityNet: A large-scale video benchmark for human activity understanding. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Boston, MA, USA, 961--970. https://doi.org/10.1109/CVPR.2015.7298698Google ScholarCross Ref
Zawar Hussain, Quan Z. Sheng, and Wei Emma Zhang. 2020. A review and categorization of techniques on device-free human activity recognition. Journal of Network and Computer Applications 167 (oct 2020), 102738. https://doi.org/10.1016/j.jnca.2020.102738Google ScholarCross Ref
Texas Instruments. 2017. Awr1642 single-chip 77-and 79-ghz fmcw radar sensor., 60 pages.Google Scholar
Texas Instruments. 2018. Dca1000evm data capture card. Retrieved May 17 (2018), 2022.Google Scholar
S. Iwasawa, K. Ebihara, J. Ohya, and S. Morishima. 1998. Real-time human posture estimation using monocular thermal images. In Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition. IEEE, Nara, Japan, 492--497. https://doi.org/10.1109/AFGR.1998.670996Google ScholarCross Ref
Yash Jain, Chi Ian Tang, Chulhong Min, Fahim Kawsar, and Akhil Mathur. 2022. ColloSSL: Collaborative Self-Supervised Learning for Human Activity Recognition. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 6, 1, Article 17 (mar 2022), 28 pages. https://doi.org/10.1145/3517246Google ScholarDigital Library
Haojian Jin, Boyuan Guo, Rituparna Roychoudhury, Yaxing Yao, Swarun Kumar, Yuvraj Agarwal, and Jason I. Hong. 2022. Exploring the Needs of Users for Supporting Privacy-Protective Behaviors in Smart Homes. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI '22). Association for Computing Machinery, New York, NY, USA, Article 449, 19 pages. https://doi.org/10.1145/3491102.3517602Google ScholarDigital Library
Charmi Jobanputra, Jatna Bavishi, and Nishant Doshi. 2019. Human activity recognition: A survey. Procedia Computer Science 155 (2019), 698--703.Google ScholarCross Ref
G. R. Kanagachidambaresan. 2021. Sensors and SBCs for Smart City Infrastructure. Springer International Publishing, Cham, 47--75. https://doi.org/10.1007/978-3-030-72957-8_3Google ScholarCross Ref
Shian-Ru Ke, Hoang Le Uyen Thuc, Yong-Jin Lee, Jenq-Neng Hwang, Jang-Hee Yoo, and Kyoung-Ho Choi. 2013. A review on video-based human activity recognition. Computers 2, 2 (2013), 88--131.Google ScholarCross Ref
H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre. 2011. HMDB: A large video database for human motion recognition. In 2011 International Conference on Computer Vision. IEEE, Barcelona, Spain, 2556--2563. https://doi.org/10.1109/ICCV.2011.6126543Google ScholarDigital Library
Hyeokhyen Kwon, Catherine Tong, Harish Haresamudram, Yan Gao, Gregory D Abowd, Nicholas D Lane, and Thomas Ploetz. 2020. IMUTube: Automatic extraction of virtual on-body accelerometry from video for human activity recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 3 (2020), 1--29.Google ScholarDigital Library
Gierad Laput, Karan Ahuja, Mayank Goel, and Chris Harrison. 2018. Ubicoustics: Plug-and-Play Acoustic Activity Recognition. In Proc. of the 31st Annual ACM Symposium on UIST (Berlin, Germany) (UIST '18). ACM, New York, NY, USA, 213--224. https://doi.org/10.1145/3242587.3242609Google ScholarDigital Library
Gierad Laput and Chris Harrison. 2019. SurfaceSight: A New Spin on Touch, User, and Object Sensing for IoT Experiences. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI '19). Association for Computing Machinery, New York, NY, USA, 1--12. https://doi.org/10.1145/3290605.3300559Google ScholarDigital Library
Gierad Laput, Yang Zhang, and Chris Harrison. 2017. Synthetic Sensors: Towards General-Purpose Sensing. In Proc. of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI '17). ACM, New York, NY, USA, 3986--3999. https://doi.org/10.1145/3025453.3025773Google ScholarDigital Library
Oscar D Lara and Miguel A Labrador. 2012. A survey on human activity recognition using wearable sensors. IEEE communications surveys & tutorials 15, 3 (2012), 1192--1209.Google Scholar
Heju Li, Xin He, Xukai Chen, Yinyin Fang, and Qun Fang. 2019. Wi-motion: A robust human activity recognition using WiFi signals. IEEE Access 7 (2019), 153287--153299.Google ScholarCross Ref
Xinyu Li, Yuan He, and Xiaojun Jing. 2019. A survey of deep learning-based human activity recognition in radar. Remote Sensing 11, 9 (2019), 1068.Google ScholarCross Ref
Dawei Liang, Guihong Li, Rebecca Adaimi, Radu Marculescu, and Edison Thomaz. 2022. AudioIMU: Enhancing Inertial Sensing-Based Activity Recognition with Acoustic Models. In Proceedings of the 2022 ACM International Symposium on Wearable Computers (Cambridge, United Kingdom) (ISWC '22). Association for Computing Machinery, New York, NY, USA, 44--48. https://doi.org/10.1145/3544794. 3558471Google ScholarDigital Library
Ji Lin, Chuang Gan, and Song Han. 2019. TSM: Temporal Shift Module for Efficient Video Understanding. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Seoul, Korea (South), 7082--7092. https://doi.org/10.1109/ICCV.2019.00718Google ScholarCross Ref
Guocheng Liu, Caixia Zhang, Qingyang Xu, Ruoshi Cheng, Yong Song, Xianfeng Yuan, and Jie Sun. 2020. I3d-shufflenet based human action Recognition. Algorithms 13, 11 (2020), 301.Google ScholarCross Ref
Jun Liu, Amir Shahroudy, Mauricio Perez, Gang Wang, Ling-Yu Duan, and Alex C Kot. 2020. NTU RGB+D 120: A large-scale benchmark for 3D human activity understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 10 (2020), 2684--2701.Google ScholarDigital Library
Sicong Liu, Junzhao Du, Anshumali Shrivastava, and Lin Zhong. 2019. Privacy Adversarial Network. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 4 (dec 2019), 1--18. https://doi.org/10.1145/3369816Google ScholarDigital Library
Zhaoyang Liu, Limin Wang, Wayne Wu, Chen Qian, and Tong Lu. 2021. TAM: Temporal Adaptive Module for Video Recognition. arXiv:2005.06803 [cs.CV]Google Scholar
Ginés Hidalgo Martınez. 2019. OpenPose: Whole-body pose estimation. Ph. D. Dissertation. Master's Thesis, Carnegie Mellon University.Google Scholar
Shinya Misaki, Keisuke Umakoshi, Tomokazu Matsui, Hyuckjin Choi, Manato Fujimoto, and Keiichi Yasumoto. 2021. Non-Contact In-Home Activity Recognition System Utilizing Doppler Sensors. In Adjunct Proceedings of the 2021 International Conference on Distributed Computing and Networking (Nara, Japan) (ICDCN '21). Association for Computing Machinery, New York, NY, USA, 169--174. https://doi.org/10.1145/3427477.3429463Google ScholarDigital Library
Mites.io. 2020. Mites.io: a full-stack ubiquitous sensing platform. https://mites.io/.Google Scholar
MMAction2. 2020. OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark. https://github.com/open-mmlab/mmaction2.Google Scholar
MMPose. 2020. OpenMMLab Pose Estimation Toolbox and Benchmark. https://github.com/open-mmlab/mmpose.Google Scholar
Vimal Mollyn, Karan Ahuja, Dhruv Verma, Chris Harrison, and Mayank Goel. 2022. SAMoSA: Sensing Activities with Motion and Subsampled Audio. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 3 (2022), 1--19.Google ScholarDigital Library
Muhammad Muaaz, Ali Chelli, Ahmed Abdelmonem Abdelgawwad, Andreu Català Mallofré, and Matthias Pätzold. 2020. WiWeHAR: Multimodal human activity recognition using Wi-Fi and wearable sensing modalities. IEEE access 8 (2020), 164453--164470.Google Scholar
Sebastian Münzner, Philip Schmidt, Attila Reiss, Michael Hanselmann, Rainer Stiefelhagen, and Robert Dürichen. 2017. CNN-Based Sensor Fusion Techniques for Multimodal Human Activity Recognition. In Proceedings of the 2017 ACM International Symposium on Wearable Computers (Maui, Hawaii) (ISWC '17). Association for Computing Machinery, New York, NY, USA, 158--165. https://doi.org/10.1145/3123021.3123046Google ScholarDigital Library
Curtis Northcutt, Lu Jiang, and Isaac Chuang. 2021. Confident learning: Estimating uncertainty in dataset labels. Journal of Artificial Intelligence Research 70 (2021), 1373--1411.Google ScholarDigital Library
Francisco Javier Ordóñez and Daniel Roggen. 2016. Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition. Sensors 16, 1 (2016). https://doi.org/10.3390/s16010115Google ScholarCross Ref
Shijia Pan, Mario Berges, Juleen Rodakowski, Pei Zhang, and Hae Young Noh. 2019. Fine-Grained Recognition of Activities of Daily Living through Structural Vibration and Electrical Sensing. In Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation (New York, NY, USA) (BuildSys '19). Association for Computing Machinery, New York, NY, USA, 149--158. https://doi.org/10.1145/3360322.3360851Google ScholarDigital Library
Preksha Pareek and Ankit Thakkar. 2021. A survey on video-based human action recognition: recent updates, datasets, challenges, and applications. Artificial Intelligence Review 54, 3 (2021), 2259--2322.Google ScholarDigital Library
Liangying Peng, Ling Chen, Zhenan Ye, and Yi Zhang. 2018. AROMA: A Deep Multi-Task Learning Based Simple and Complex Human Activity Recognition Method Using Wearable Sensors. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2, 2, Article 74 (jul 2018), 16 pages. https://doi.org/10.1145/3214277Google ScholarDigital Library
Joseph Phelps, Glen Nowak, and Elizabeth Ferrell. 2000. Privacy Concerns and Consumer Willingness to Provide Personal Information. Journal of Public Policy & Marketing 19, 1 (2000), 27--41. http://www.jstor.org/stable/30000485Google ScholarCross Ref
Prasoon Patidar, Mayank Goel, Yuvraj Agarwal. 2023. VAX: Open-source repository for the VAX system. https://github.com/synergylabs/vax.Google Scholar
Riccardo Presotto, Gabriele Civitarese, and Claudio Bettini. 2022. Federated Clustering and Semi-Supervised learning: A new partnership for personalized Human Activity Recognition. Pervasive and Mobile Computing 88 (2022), 101726.Google ScholarDigital Library
Valentin Radu and Maximilian Henne. 2019. Vision2sensor: Knowledge transfer across sensing modalities for human activity recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 3 (2019), 1--21.Google ScholarDigital Library
Bhiksha Raj, Kaustubh Kalgaonkar, Chris Harrison, and Paul Dietz. 2012. Ultrasonic Doppler Sensing in HCI. IEEE Pervasive Computing 11, 2 (2012), 24--29. https://doi.org/10.1109/MPRV.2012.17Google ScholarDigital Library
Sreenivasan Ramasamy Ramamurthy and Nirmalya Roy. 2018. Recent trends in machine learning for human activity recognition---A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8, 4 (2018), e1254.Google ScholarCross Ref
Suneth Ranasinghe, Fadi Al Machot, and Heinrich C Mayr. 2016. A review on applications of activity recognition systems with regard to performance and evaluation. International Journal of Distributed Sensor Networks 12, 8 (2016), 1550147716665520. https://doi.org/10.1177/1550147716665520 arXiv:https://doi.org/10.1177/1550147716665520Google ScholarCross Ref
Lipsarani Sahoo, Nazmus Sakib Miazi, Mohamed Shehab, Florian Alt, and Yomna Abdelrahman. 2022. You Know Too Much: Investigating Users' Perceptions and Privacy Concerns Towards Thermal Imaging. In Privacy Symposium 2022, Stefan Schiffner, Sebastien Ziegler, and Adrian Quesada Rodriguez (Eds.). Springer International Publishing, Cham, 207--229.Google Scholar
Alex Schiffer. 2017. How a fish tank helped hack a casino. https://www.washingtonpost.com/news/innovations/wp/2017/07/21/how-a-fish-tank-helped-hack-a-casino/?noredirect=on.Google Scholar
Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. 2016. NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis. arXiv:1604.02808 [cs.CV]Google Scholar
Hao Shao, Shengju Qian, and Yu Liu. 2020. Temporal Interlacing Network. Proceedings of the AAAI Conference on Artificial Intelligence 34, 07 (Apr. 2020), 11966--11973. https://doi.org/10.1609/aaai.v34i07.6872Google ScholarCross Ref
Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. 2019. Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Long Beach, CA, USA, 12018--12027. https://doi.org/10.1109/CVPR.2019.01230Google ScholarCross Ref
Akash Deep Singh, Sandeep Singh Sandha, Luis Garcia, and Mani Srivastava. 2019. RadHAR: Human Activity Recognition from Point Clouds Generated through a Millimeter-Wave Radar. In Proceedings of the 3rd ACM Workshop on Millimeter-Wave Networks and Sensing Systems (Los Cabos, Mexico) (mmNets'19). Association for Computing Machinery, New York, NY, USA, 51--56. https://doi.org/10.1145/3349624.3356768Google ScholarDigital Library
Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild. arXiv:1212.0402 [cs.CV]Google Scholar
Chen Sun, Abhinav Shrivastava, Carl Vondrick, Kevin Murphy, Rahul Sukthankar, and Cordelia Schmid. 2018. Actor-Centric Relation Network. In Computer Vision -- ECCV 2018, Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer International Publishing, Cham, 335--351.Google ScholarDigital Library
Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang. 2019. Deep High-Resolution Representation Learning for Human Pose Estimation. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Long Beach, CA, USA, 5686--5696. https://doi.org/10.1109/CVPR.2019.00584Google ScholarCross Ref
Vishnu Priya Thotakura and Purnachand Nalluri. 2022. Convolutional 3D in Activity Recognition -A Review. In 2022 2nd International Conference on Artificial Intelligence and Signal Processing (AISP). IEEE, Vijayawada, India, 1--6. https://doi.org/10.1109/AISP53593.2022. 9760638Google ScholarCross Ref
Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. 2018. A Closer Look at Spatiotemporal Convolutions for Action Recognition. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT, USA, 6450--6459. https://doi.org/10.1109/CVPR.2018.00675Google ScholarCross Ref
Kimberly T. Tran, Lewis D. Griffin, Kevin Chetty, and Shelly Vishwakarma. 2020. Transfer Learning from Audio Deep Learning Models for Micro-Doppler Activity Recognition. In 2020 IEEE International Radar Conference (RADAR). IEEE, Washington, DC, USA, 584--589. https://doi.org/10.1109/RADAR42522.2020.9114643Google ScholarCross Ref
Michalis Vrigkas, Christophoros Nikou, and Ioannis A Kakadiaris. 2015. A review of human activity recognition methods. Frontiers in Robotics and AI 2 (2015), 28.Google ScholarCross Ref
Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2016. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition. In Computer Vision -- ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 20--36.Google ScholarCross Ref
Pete Warden, Matthew Stewart, Brian Plancher, Colby Banbury, Shvetank Prakash, Emma Chen, Zain Asgar, Sachin Katti, and Vijay Janapa Reddi. 2022. Machine Learning Sensors. https://doi.org/10.48550/ARXIV.2206.03266Google ScholarCross Ref
Chao-Yuan Wu, Christoph Feichtenhofer, Haoqi Fan, Kaiming He, Philipp Krahenbuhl, and Ross Girshick. 2019. Long-term feature banks for detailed video understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Long Beach, CA, USA, 284--293.Google ScholarCross Ref
Tong Wu, Murtadha Aldeer, Tahiya Chowdhury, Amber Haynes, Fateme Nikseresht, Mahsa Pahlavikhah Varnosfaderani, Jiechao Gao, Arsalan Heydarian, Brad Campbell, and Jorge Ortiz. 2021. The Smart Building Privacy Challenge. In Proceedings of the 8th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation (Coimbra, Portugal) (BuildSys '21). Association for Computing Machinery, New York, NY, USA, 238--239. https://doi.org/10.1145/3486611.3492234Google ScholarDigital Library
Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence (AAAI'18/IAAI'18/EAAI'18). AAAI Press, New Orleans, Louisiana, USA, Article 912, 9 pages.Google Scholar
Ceyuan Yang, Yinghao Xu, Jianping Shi, Bo Dai, and Bolei Zhou. 2020. Temporal Pyramid Network for Action Recognition. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA, USA, 588--597. https://doi.org/10.1109/CVPR42600.2020.00067Google ScholarCross Ref
Deju Yang, Liangli Ma, and Fei Liao. 2019. An Intelligent Voice Interaction System Based on Raspberry Pi. In 2019 11th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), Vol. 1. IEEE, Hangzhou, China, 237--240. https://doi.org/10. 1109/IHMSC.2019.00062Google Scholar
Yang Yang, Chunping Hou, Yue Lang, Dai Guan, Danyang Huang, and Jinchen Xu. 2019. Open-set human activity recognition based on micro-Doppler signatures. Pattern Recognition 85 (2019), 60--69.Google ScholarCross Ref
Zhaoyuan Yang, Yang Zhao, and Weizhong Yan. 2020. Adversarial Vulnerability in Doppler-based Human Activity Recognition. In 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, Glasgow, UK, 1--7. https://doi.org/10.1109/IJCNN48605.2020.9207686Google ScholarCross Ref
Bolei Zhou, Alex Andonian, Aude Oliva, and Antonio Torralba. 2018. Temporal Relational Reasoning in Videos. arXiv:1711.08496 [cs.CV]Google Scholar

Index Terms

VAX: Using Existing Video and Audio-based Activity Recognition Models to Bootstrap Privacy-Sensitive Sensors
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Sensors and actuators
2. Human-centered computing
  1. Ubiquitous and mobile computing
    1. Ubiquitous and mobile computing theory, concepts and paradigms
      1. Ambient intelligence

Recommendations

Few-Shot Human Activity Recognition on Noisy Wearable Sensor Data
Database Systems for Advanced Applications
Abstract
Most existing wearable sensor-based human activity recognition (HAR) models are trained on substantial labeled data. It is difficult for HAR to learn new-class activities unseen during training from a few samples. Very few researches of few-shot ...
Read More
SelfHAR: Improving Human Activity Recognition through Self-training with Unlabeled Data

Machine learning and deep learning have shown great promise in mobile sensing applications, including Human Activity Recognition. However, the performance of such models in real-world settings largely depends on the availability of large datasets that ...
Read More
Multiple-instance domain adaptation for cost-effective sensor-based human activity recognition
Abstract
Machine learning-based human activity recognition (HAR) is important as the means of human–computer interaction to empower the existing systems in many areas, such as healthcare, entertainment, logistics, and manufacturing. To build ...
Highlights
- Human activity recognition (HAR) and its application are beneficial in real-life.
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies Volume 7, Issue 3
September 2023
1734 pages
EISSN:2474-9567
DOI:10.1145/3626192
Issue’s Table of Contents

Copyright © 2023 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 September 2023
Published in imwut Volume 7, Issue 3

Check for updates
Author Tags
human activity recognition
privacy first design
ubiquitous sensing
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 353
  Total Downloads
- Downloads (Last 12 months)353
- Downloads (Last 6 weeks)60
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

VAX: Using Existing Video and Audio-based Activity Recognition Models to Bootstrap Privacy-Sensitive Sensors

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Few-Shot Human Activity Recognition on Noisy Wearable Sensor Data

SelfHAR: Improving Human Activity Recognition through Self-training with Unlabeled Data

Multiple-instance domain adaptation for cost-effective sensor-based human activity recognition

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

VAX: Using Existing Video and Audio-based Activity Recognition Models to Bootstrap Privacy-Sensitive Sensors

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Few-Shot Human Activity Recognition on Noisy Wearable Sensor Data

SelfHAR: Improving Human Activity Recognition through Self-training with Unlabeled Data

Multiple-instance domain adaptation for cost-effective sensor-based human activity recognition

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media