skip to main content
research-article
Open Access

VAX: Using Existing Video and Audio-based Activity Recognition Models to Bootstrap Privacy-Sensitive Sensors

Published:27 September 2023Publication History
Skip Abstract Section

Abstract

The use of audio and video modalities for Human Activity Recognition (HAR) is common, given the richness of the data and the availability of pre-trained ML models using a large corpus of labeled training data. However, audio and video sensors also lead to significant consumer privacy concerns. Researchers have thus explored alternate modalities that are less privacy-invasive such as mmWave doppler radars, IMUs, motion sensors. However, the key limitation of these approaches is that most of them do not readily generalize across environments and require significant in-situ training data. Recent work has proposed cross-modality transfer learning approaches to alleviate the lack of trained labeled data with some success. In this paper, we generalize this concept to create a novel system called VAX (Video/Audio to 'X'), where training labels acquired from existing Video/Audio ML models are used to train ML models for a wide range of 'X' privacy-sensitive sensors. Notably, in VAX, once the ML models for the privacy-sensitive sensors are trained, with little to no user involvement, the Audio/Video sensors can be removed altogether to protect the user's privacy better. We built and deployed VAX in ten participants' homes while they performed 17 common activities of daily living. Our evaluation results show that after training, VAX can use its onboard camera and microphone to detect approximately 15 out of 17 activities with an average accuracy of 90%. For these activities that can be detected using a camera and a microphone, VAX trains a per-home model for the privacy-preserving sensors. These models (average accuracy = 84%) require no in-situ user input. In addition, when VAX is augmented with just one labeled instance for the activities not detected by the VAX A/V pipeline (~2 out of 17), it can detect all 17 activities with an average accuracy of 84%. Our results show that VAX is significantly better than a baseline supervised-learning approach of using one labeled instance per activity in each home (average accuracy of 79%) since VAX reduces the user burden of providing activity labels by 8x (~2 labels vs. 17 labels).

Skip Supplemental Material Section

Supplemental Material

References

  1. Noura Abdi, Kopo M. Ramokapane, and Jose M. Such. 2019. More than Smart Speakers: Security and Privacy Perceptions of Smart Home Personal Assistants. In Fifteenth Symposium on Usable Privacy and Security (SOUPS 2019). USENIX Association, Santa Clara, CA, 451--466. https://www.usenix.org/conference/soups2019/presentation/abdiGoogle ScholarGoogle Scholar
  2. Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. 2016. YouTube-8M: A Large-Scale Video Classification Benchmark. arXiv:1609.08675 [cs.CV]Google ScholarGoogle Scholar
  3. Matheus Gabriel Acorsi, Leandro Maria Gimenez, and Maurício Martello. 2020. Assessing the performance of a low-cost thermal camera in proximal and aerial conditions. Remote Sensing 12, 21 (2020), 3591.Google ScholarGoogle ScholarCross RefCross Ref
  4. Antonio A Aguileta, Ramon F Brena, Oscar Mayora, Erik Molino-Minero-Re, and Luis A Trejo. 2019. Multi-sensor fusion for activity recognition---A survey. Sensors 19, 17 (2019), 3808.Google ScholarGoogle ScholarCross RefCross Ref
  5. Karan Ahuja, Yue Jiang, Mayank Goel, and Chris Harrison. 2021. Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity Recognition. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI '21). Association for Computing Machinery, New York, NY, USA, Article 292, 10 pages. https://doi.org/10.1145/3411764.3445138Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Reed Albergotti. 2019. How Nest, designed to keep intruders out of people's homes, effectively allowed hackers to get in, researchers claim. https://www.washingtonpost.com/technology/2019/04/23/how-nest-designed-keep-intruders-out-peoples-homes-effectively-allowed-hackers-get/?noredirect=on.Google ScholarGoogle Scholar
  7. India Ashok. 2016. Hackers leave Finnish residents cold after DDoS attack knocks out heating systems. https://www.ibtimes.co.uk/hackers-leave-finnish-residents-cold-after-ddos-attack-knocks-out-heating-systems-1590639.Google ScholarGoogle Scholar
  8. Yusuf Aytar, Carl Vondrick, and Antonio Torralba. 2016. SoundNet: Learning Sound Representations from Unlabeled Video. In Proceedings of the 30th International Conference on Neural Information Processing Systems (Barcelona, Spain) (NIPS'16). Curran Associates Inc., Red Hook, NY, USA, 892--900.Google ScholarGoogle Scholar
  9. Bharathan Balaji, Jason Koh, Nadir Weibel, and Yuvraj Agarwal. 2016. Genie: A Longitudinal Study Comparing Physical and Software Thermostats in Office Buildings. In Proc. of the 2016 ACM Internat. Joint Conference on Pervasive and Ubiquitous Computing (Heidelberg, Germany) (UbiComp '16). ACM, New York, NY, USA, 1200--1211. https://doi.org/10.1145/2971648.2971719Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Alex Beltran, Varick L. Erickson, and Alberto E. Cerpa. 2013. ThermoSense: Occupancy Thermal Based Sensing for HVAC Control. In Proc. of the 5th ACM Workshop on Embedded Systems For Energy-Efficient Buildings (Roma, Italy) (BuildSys'13). ACM, New York, NY, USA, 1--8. https://doi.org/10.1145/2528282.2528301Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gedas Bertasius, Heng Wang, and Lorenzo Torresani. 2021. Is Space-Time Attention All You Need for Video Understanding? arXiv:2102.05095 [cs.CV]Google ScholarGoogle Scholar
  12. Sejal Bhalla, Mayank Goel, and Rushil Khurana. 2021. IMU2Doppler: Cross-Modal Domain Adaptation for Doppler-based Activity Recognition Using IMU Data. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 4 (2021), 1--20.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Sudershan Boovaraghavan, Chen Chen, Anurag Maravi, Mike Czapik, Yang Zhang, Chris Harrison, and Yuvraj Agarwal. 2023. Mites: Design and Deployment of a General-Purpose Sensing Infrastructure for Buildings. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 7, 1, Article 2 (mar 2023), 32 pages. https://doi.org/10.1145/3580865Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Bosch. 2022. Cross Domain Development Kit | XDK. https://www.bosch-connectivity.com/media/downloads/xdk/xdk_node_110_combined_datasheet.pdf.Google ScholarGoogle Scholar
  15. Hong Cai, Belal Korany, Chitra R Karanam, and Yasamin Mostofi. 2020. Teaching rf to sense without rf training measurements. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 4 (2020), 1--22.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kelly E. Caine, Arthur D. Fisk, and Wendy A. Rogers. 2006. Benefits and Privacy Concerns of a Home Equipped with a Visual Sensing System: A Perspective from Older Adults. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 50, 2 (2006), 180--184. https://doi.org/10.1177/154193120605000203 arXiv:https://doi.org/10.1177/154193120605000203Google ScholarGoogle ScholarCross RefCross Ref
  17. Timothy I Cannings, Yingying Fan, and Richard J Samworth. 2020. Classification with imperfect training labels. Biometrika 107, 2 (2020), 311--330.Google ScholarGoogle ScholarCross RefCross Ref
  18. Song Cao and Ram Nevatia. 2016. Exploring deep learning based solutions in fine grained activity recognition in the wild. In 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, Cancun, Mexico, 384--389. https://doi.org/10.1109/ICPR.2016.7899664Google ScholarGoogle ScholarCross RefCross Ref
  19. João Carreira and Andrew Zisserman. 2017. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017. IEEE Computer Society, Honolulu, HI, USA, 4724--4733. https://doi.org/10.1109/CVPR.2017.502Google ScholarGoogle ScholarCross RefCross Ref
  20. Youngjae Chang, Akhil Mathur, Anton Isopoussu, Junehwa Song, and Fahim Kawsar. 2020. A systematic study of unsupervised domain adaptation for robust human-activity recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 1 (2020), 1--30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Youngjae Chang, Akhil Mathur, Anton Isopoussu, Junehwa Song, and Fahim Kawsar. 2020. A systematic study of unsupervised domain adaptation for robust human-activity recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 1 (2020), 1--30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321--357.Google ScholarGoogle ScholarCross RefCross Ref
  23. Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, Zheng Zhang, Dazhi Cheng, Chenchen Zhu, Tianheng Cheng, Qijie Zhao, Buyu Li, Xin Lu, Rui Zhu, Yue Wu, Jifeng Dai, Jingdong Wang, Jianping Shi, Wanli Ouyang, Chen Change Loy, and Dahua Lin. 2019. MMDetection: Open MMLab Detection Toolbox and Benchmark. arXiv:1906.07155 [cs.CV]Google ScholarGoogle Scholar
  24. Qingchao Chen, Bo Tan, Kevin Chetty, and Karl Woodbridge. 2016. Activity recognition based on micro-Doppler signature with in-home Wi-Fi. In 2016 IEEE 18th International Conference on e-Health Networking, Applications and Services (Healthcom). IEEE, Munich, Germany, 1--6. https://doi.org/10.1109/HealthCom.2016.7749457Google ScholarGoogle ScholarCross RefCross Ref
  25. Wenqiang Chen, Shupei Lin, Elizabeth Thompson, and John Stankovic. 2021. Sensecollect: We need efficient ways to collect on-body sensor-based human activity data! Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 3 (2021), 1--27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Shohreh Deldari, Hao Xue, Aaqib Saeed, Jiayuan He, Daniel V. Smith, and Flora D. Salim. 2022. Beyond Just Vision: A Review on Self-Supervised Representation Learning on Multimodal and Temporal Data. arXiv:2206.02353 [cs.LG]Google ScholarGoogle Scholar
  27. Shohreh Deldari, Hao Xue, Aaqib Saeed, Daniel V Smith, and Flora D Salim. 2022. COCOA: Cross Modality Contrastive Learning for Sensor Data. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 3 (2022), 1--28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Florenc Demrozi, Graziano Pravadelli, Azra Bihorac, and Parisa Rashidi. 2020. Human Activity Recognition Using Inertial, Physiological and Environmental Sensors: A Comprehensive Survey. IEEE Access 8 (2020), 210816--210836. https://doi.org/10.1109/ACCESS.2020. 3037715Google ScholarGoogle ScholarCross RefCross Ref
  29. Konstantinos Drossos, Stylianos I. Mimilakis, Shayan Gharib, Yanxiong Li, and Tuomas Virtanen. 2020. Sound Event Detection with Depthwise Separable and Dilated Convolutions. In 2020 International Joint Conference on Neural Networks (IJCNN). IJCNN, Glasgow, UK, 1--7. https://doi.org/10.1109/IJCNN48605.2020.9207532Google ScholarGoogle ScholarCross RefCross Ref
  30. Haodong Duan, Jiaqi Wang, Kai Chen, and Dahua Lin. 2022. PYSKL: Towards Good Practices for Skeleton Action Recognition. In Proceedings of the 30th ACM International Conference on Multimedia (Lisboa, Portugal) (MM '22). Association for Computing Machinery, New York, NY, USA, 7351--7354. https://doi.org/10.1145/3503161.3548546Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Haodong Duan, Yue Zhao, Kai Chen, Dahua Lin, and Bo Dai. 2022. Revisiting Skeleton-based Action Recognition. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, New Orleans, LA, USA, 2959--2968. https://doi.org/10.1109/ CVPR52688.2022.00298Google ScholarGoogle Scholar
  32. Pardis Emami-Naeini, Janarth Dheenadhayalan, Yuvraj Agarwal, and Lorrie Faith Cranor. 2021. Which Privacy and Security Attributes Most Impact Consumers' Risk Perception and Willingness to Purchase IoT Devices?. In 2021 IEEE Symposium on Security and Privacy (SP). IEEE, San Francisco, CA, USA, 519--536. https://doi.org/10.1109/SP40001.2021.00112Google ScholarGoogle ScholarCross RefCross Ref
  33. Pardis Emami-Naeini, Henry Dixon, Yuvraj Agarwal, and Lorrie Faith Cranor. 2019. Exploring How Privacy and Security Factor into IoT Device Purchase Behavior. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI '19). Association for Computing Machinery, New York, NY, USA, 1--12. https://doi.org/10.1145/3290605.3300764Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Baris Erol, Sevgi Z. Gurbuz, and Moeness G. Amin. 2019. GAN-based Synthetic Radar Micro-Doppler Augmentations for Improved Human Activity Recognition. In 2019 IEEE Radar Conference (RadarConf). IEEE, Boston, MA, USA, 1--5. https://doi.org/10.1109/RADAR.2019.8835589Google ScholarGoogle ScholarCross RefCross Ref
  35. Christoph Feichtenhofer. 2020. X3D: Expanding Architectures for Efficient Video Recognition. arXiv:2004.04730 [cs.CV]Google ScholarGoogle Scholar
  36. Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. 2019. SlowFast Networks for Video Recognition. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Seoul, Korea (South), 6201--6210. https://doi.org/10.1109/ICCV.2019.00630Google ScholarGoogle ScholarCross RefCross Ref
  37. Jort F. Gemmeke, Daniel P. W. Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R. Channing Moore, Manoj Plakal, and Marvin Ritter. 2017. Audio Set: An ontology and human-labeled dataset for audio events. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, New Orleans, LA, USA, 776--780. https://doi.org/10.1109/ICASSP.2017.7952261Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Deepti Ghadiyaram, Du Tran, and Dhruv Mahajan. 2019. Large-Scale Weakly-Supervised Pre-Training for Video Action Recognition. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Long Beach, CA, USA, 12038--12047. https://doi.org/10.1109/CVPR.2019.01232Google ScholarGoogle ScholarCross RefCross Ref
  39. Emily Green. 2018. Hacker terrorizes family by hijacking baby monitor. https://nordvpn.com/blog/baby-monitor-iot-hacking/.Google ScholarGoogle Scholar
  40. Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, and Jitendra Malik. 2018. AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT, USA, 6047--6056. https://doi.org/10.1109/CVPR.2018.00633Google ScholarGoogle ScholarCross RefCross Ref
  41. Harish Haresamudram, Irfan Essa, and Thomas Plötz. 2023. Investigating Enhancements to Contrastive Predictive Coding for Human Activity Recognition. In 2023 IEEE International Conference on Pervasive Computing and Communications (PerCom). IEEE, Atlanta, GA, USA, 232--241. https://doi.org/10.1109/PERCOM56429.2023.10099197Google ScholarGoogle ScholarCross RefCross Ref
  42. Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem, and Juan Carlos Niebles. 2015. ActivityNet: A large-scale video benchmark for human activity understanding. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Boston, MA, USA, 961--970. https://doi.org/10.1109/CVPR.2015.7298698Google ScholarGoogle ScholarCross RefCross Ref
  43. Zawar Hussain, Quan Z. Sheng, and Wei Emma Zhang. 2020. A review and categorization of techniques on device-free human activity recognition. Journal of Network and Computer Applications 167 (oct 2020), 102738. https://doi.org/10.1016/j.jnca.2020.102738Google ScholarGoogle ScholarCross RefCross Ref
  44. Texas Instruments. 2017. Awr1642 single-chip 77-and 79-ghz fmcw radar sensor., 60 pages.Google ScholarGoogle Scholar
  45. Texas Instruments. 2018. Dca1000evm data capture card. Retrieved May 17 (2018), 2022.Google ScholarGoogle Scholar
  46. S. Iwasawa, K. Ebihara, J. Ohya, and S. Morishima. 1998. Real-time human posture estimation using monocular thermal images. In Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition. IEEE, Nara, Japan, 492--497. https://doi.org/10.1109/AFGR.1998.670996Google ScholarGoogle ScholarCross RefCross Ref
  47. Yash Jain, Chi Ian Tang, Chulhong Min, Fahim Kawsar, and Akhil Mathur. 2022. ColloSSL: Collaborative Self-Supervised Learning for Human Activity Recognition. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 6, 1, Article 17 (mar 2022), 28 pages. https://doi.org/10.1145/3517246Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Haojian Jin, Boyuan Guo, Rituparna Roychoudhury, Yaxing Yao, Swarun Kumar, Yuvraj Agarwal, and Jason I. Hong. 2022. Exploring the Needs of Users for Supporting Privacy-Protective Behaviors in Smart Homes. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI '22). Association for Computing Machinery, New York, NY, USA, Article 449, 19 pages. https://doi.org/10.1145/3491102.3517602Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Charmi Jobanputra, Jatna Bavishi, and Nishant Doshi. 2019. Human activity recognition: A survey. Procedia Computer Science 155 (2019), 698--703.Google ScholarGoogle ScholarCross RefCross Ref
  50. G. R. Kanagachidambaresan. 2021. Sensors and SBCs for Smart City Infrastructure. Springer International Publishing, Cham, 47--75. https://doi.org/10.1007/978-3-030-72957-8_3Google ScholarGoogle ScholarCross RefCross Ref
  51. Shian-Ru Ke, Hoang Le Uyen Thuc, Yong-Jin Lee, Jenq-Neng Hwang, Jang-Hee Yoo, and Kyoung-Ho Choi. 2013. A review on video-based human activity recognition. Computers 2, 2 (2013), 88--131.Google ScholarGoogle ScholarCross RefCross Ref
  52. H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre. 2011. HMDB: A large video database for human motion recognition. In 2011 International Conference on Computer Vision. IEEE, Barcelona, Spain, 2556--2563. https://doi.org/10.1109/ICCV.2011.6126543Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Hyeokhyen Kwon, Catherine Tong, Harish Haresamudram, Yan Gao, Gregory D Abowd, Nicholas D Lane, and Thomas Ploetz. 2020. IMUTube: Automatic extraction of virtual on-body accelerometry from video for human activity recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 3 (2020), 1--29.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Gierad Laput, Karan Ahuja, Mayank Goel, and Chris Harrison. 2018. Ubicoustics: Plug-and-Play Acoustic Activity Recognition. In Proc. of the 31st Annual ACM Symposium on UIST (Berlin, Germany) (UIST '18). ACM, New York, NY, USA, 213--224. https://doi.org/10.1145/3242587.3242609Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Gierad Laput and Chris Harrison. 2019. SurfaceSight: A New Spin on Touch, User, and Object Sensing for IoT Experiences. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI '19). Association for Computing Machinery, New York, NY, USA, 1--12. https://doi.org/10.1145/3290605.3300559Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Gierad Laput, Yang Zhang, and Chris Harrison. 2017. Synthetic Sensors: Towards General-Purpose Sensing. In Proc. of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI '17). ACM, New York, NY, USA, 3986--3999. https://doi.org/10.1145/3025453.3025773Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Oscar D Lara and Miguel A Labrador. 2012. A survey on human activity recognition using wearable sensors. IEEE communications surveys & tutorials 15, 3 (2012), 1192--1209.Google ScholarGoogle Scholar
  58. Heju Li, Xin He, Xukai Chen, Yinyin Fang, and Qun Fang. 2019. Wi-motion: A robust human activity recognition using WiFi signals. IEEE Access 7 (2019), 153287--153299.Google ScholarGoogle ScholarCross RefCross Ref
  59. Xinyu Li, Yuan He, and Xiaojun Jing. 2019. A survey of deep learning-based human activity recognition in radar. Remote Sensing 11, 9 (2019), 1068.Google ScholarGoogle ScholarCross RefCross Ref
  60. Dawei Liang, Guihong Li, Rebecca Adaimi, Radu Marculescu, and Edison Thomaz. 2022. AudioIMU: Enhancing Inertial Sensing-Based Activity Recognition with Acoustic Models. In Proceedings of the 2022 ACM International Symposium on Wearable Computers (Cambridge, United Kingdom) (ISWC '22). Association for Computing Machinery, New York, NY, USA, 44--48. https://doi.org/10.1145/3544794. 3558471Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Ji Lin, Chuang Gan, and Song Han. 2019. TSM: Temporal Shift Module for Efficient Video Understanding. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Seoul, Korea (South), 7082--7092. https://doi.org/10.1109/ICCV.2019.00718Google ScholarGoogle ScholarCross RefCross Ref
  62. Guocheng Liu, Caixia Zhang, Qingyang Xu, Ruoshi Cheng, Yong Song, Xianfeng Yuan, and Jie Sun. 2020. I3d-shufflenet based human action Recognition. Algorithms 13, 11 (2020), 301.Google ScholarGoogle ScholarCross RefCross Ref
  63. Jun Liu, Amir Shahroudy, Mauricio Perez, Gang Wang, Ling-Yu Duan, and Alex C Kot. 2020. NTU RGB+D 120: A large-scale benchmark for 3D human activity understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 10 (2020), 2684--2701.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Sicong Liu, Junzhao Du, Anshumali Shrivastava, and Lin Zhong. 2019. Privacy Adversarial Network. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 4 (dec 2019), 1--18. https://doi.org/10.1145/3369816Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Zhaoyang Liu, Limin Wang, Wayne Wu, Chen Qian, and Tong Lu. 2021. TAM: Temporal Adaptive Module for Video Recognition. arXiv:2005.06803 [cs.CV]Google ScholarGoogle Scholar
  66. Ginés Hidalgo Martınez. 2019. OpenPose: Whole-body pose estimation. Ph. D. Dissertation. Master's Thesis, Carnegie Mellon University.Google ScholarGoogle Scholar
  67. Shinya Misaki, Keisuke Umakoshi, Tomokazu Matsui, Hyuckjin Choi, Manato Fujimoto, and Keiichi Yasumoto. 2021. Non-Contact In-Home Activity Recognition System Utilizing Doppler Sensors. In Adjunct Proceedings of the 2021 International Conference on Distributed Computing and Networking (Nara, Japan) (ICDCN '21). Association for Computing Machinery, New York, NY, USA, 169--174. https://doi.org/10.1145/3427477.3429463Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Mites.io. 2020. Mites.io: a full-stack ubiquitous sensing platform. https://mites.io/.Google ScholarGoogle Scholar
  69. MMAction2. 2020. OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark. https://github.com/open-mmlab/mmaction2.Google ScholarGoogle Scholar
  70. MMPose. 2020. OpenMMLab Pose Estimation Toolbox and Benchmark. https://github.com/open-mmlab/mmpose.Google ScholarGoogle Scholar
  71. Vimal Mollyn, Karan Ahuja, Dhruv Verma, Chris Harrison, and Mayank Goel. 2022. SAMoSA: Sensing Activities with Motion and Subsampled Audio. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 3 (2022), 1--19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Muhammad Muaaz, Ali Chelli, Ahmed Abdelmonem Abdelgawwad, Andreu Català Mallofré, and Matthias Pätzold. 2020. WiWeHAR: Multimodal human activity recognition using Wi-Fi and wearable sensing modalities. IEEE access 8 (2020), 164453--164470.Google ScholarGoogle Scholar
  73. Sebastian Münzner, Philip Schmidt, Attila Reiss, Michael Hanselmann, Rainer Stiefelhagen, and Robert Dürichen. 2017. CNN-Based Sensor Fusion Techniques for Multimodal Human Activity Recognition. In Proceedings of the 2017 ACM International Symposium on Wearable Computers (Maui, Hawaii) (ISWC '17). Association for Computing Machinery, New York, NY, USA, 158--165. https://doi.org/10.1145/3123021.3123046Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Curtis Northcutt, Lu Jiang, and Isaac Chuang. 2021. Confident learning: Estimating uncertainty in dataset labels. Journal of Artificial Intelligence Research 70 (2021), 1373--1411.Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Francisco Javier Ordóñez and Daniel Roggen. 2016. Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition. Sensors 16, 1 (2016). https://doi.org/10.3390/s16010115Google ScholarGoogle ScholarCross RefCross Ref
  76. Shijia Pan, Mario Berges, Juleen Rodakowski, Pei Zhang, and Hae Young Noh. 2019. Fine-Grained Recognition of Activities of Daily Living through Structural Vibration and Electrical Sensing. In Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation (New York, NY, USA) (BuildSys '19). Association for Computing Machinery, New York, NY, USA, 149--158. https://doi.org/10.1145/3360322.3360851Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Preksha Pareek and Ankit Thakkar. 2021. A survey on video-based human action recognition: recent updates, datasets, challenges, and applications. Artificial Intelligence Review 54, 3 (2021), 2259--2322.Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Liangying Peng, Ling Chen, Zhenan Ye, and Yi Zhang. 2018. AROMA: A Deep Multi-Task Learning Based Simple and Complex Human Activity Recognition Method Using Wearable Sensors. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2, 2, Article 74 (jul 2018), 16 pages. https://doi.org/10.1145/3214277Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Joseph Phelps, Glen Nowak, and Elizabeth Ferrell. 2000. Privacy Concerns and Consumer Willingness to Provide Personal Information. Journal of Public Policy & Marketing 19, 1 (2000), 27--41. http://www.jstor.org/stable/30000485Google ScholarGoogle ScholarCross RefCross Ref
  80. Prasoon Patidar, Mayank Goel, Yuvraj Agarwal. 2023. VAX: Open-source repository for the VAX system. https://github.com/synergylabs/vax.Google ScholarGoogle Scholar
  81. Riccardo Presotto, Gabriele Civitarese, and Claudio Bettini. 2022. Federated Clustering and Semi-Supervised learning: A new partnership for personalized Human Activity Recognition. Pervasive and Mobile Computing 88 (2022), 101726.Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Valentin Radu and Maximilian Henne. 2019. Vision2sensor: Knowledge transfer across sensing modalities for human activity recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 3 (2019), 1--21.Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Bhiksha Raj, Kaustubh Kalgaonkar, Chris Harrison, and Paul Dietz. 2012. Ultrasonic Doppler Sensing in HCI. IEEE Pervasive Computing 11, 2 (2012), 24--29. https://doi.org/10.1109/MPRV.2012.17Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Sreenivasan Ramasamy Ramamurthy and Nirmalya Roy. 2018. Recent trends in machine learning for human activity recognition---A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8, 4 (2018), e1254.Google ScholarGoogle ScholarCross RefCross Ref
  85. Suneth Ranasinghe, Fadi Al Machot, and Heinrich C Mayr. 2016. A review on applications of activity recognition systems with regard to performance and evaluation. International Journal of Distributed Sensor Networks 12, 8 (2016), 1550147716665520. https://doi.org/10.1177/1550147716665520 arXiv:https://doi.org/10.1177/1550147716665520Google ScholarGoogle ScholarCross RefCross Ref
  86. Lipsarani Sahoo, Nazmus Sakib Miazi, Mohamed Shehab, Florian Alt, and Yomna Abdelrahman. 2022. You Know Too Much: Investigating Users' Perceptions and Privacy Concerns Towards Thermal Imaging. In Privacy Symposium 2022, Stefan Schiffner, Sebastien Ziegler, and Adrian Quesada Rodriguez (Eds.). Springer International Publishing, Cham, 207--229.Google ScholarGoogle Scholar
  87. Alex Schiffer. 2017. How a fish tank helped hack a casino. https://www.washingtonpost.com/news/innovations/wp/2017/07/21/how-a-fish-tank-helped-hack-a-casino/?noredirect=on.Google ScholarGoogle Scholar
  88. Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. 2016. NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis. arXiv:1604.02808 [cs.CV]Google ScholarGoogle Scholar
  89. Hao Shao, Shengju Qian, and Yu Liu. 2020. Temporal Interlacing Network. Proceedings of the AAAI Conference on Artificial Intelligence 34, 07 (Apr. 2020), 11966--11973. https://doi.org/10.1609/aaai.v34i07.6872Google ScholarGoogle ScholarCross RefCross Ref
  90. Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. 2019. Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Long Beach, CA, USA, 12018--12027. https://doi.org/10.1109/CVPR.2019.01230Google ScholarGoogle ScholarCross RefCross Ref
  91. Akash Deep Singh, Sandeep Singh Sandha, Luis Garcia, and Mani Srivastava. 2019. RadHAR: Human Activity Recognition from Point Clouds Generated through a Millimeter-Wave Radar. In Proceedings of the 3rd ACM Workshop on Millimeter-Wave Networks and Sensing Systems (Los Cabos, Mexico) (mmNets'19). Association for Computing Machinery, New York, NY, USA, 51--56. https://doi.org/10.1145/3349624.3356768Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild. arXiv:1212.0402 [cs.CV]Google ScholarGoogle Scholar
  93. Chen Sun, Abhinav Shrivastava, Carl Vondrick, Kevin Murphy, Rahul Sukthankar, and Cordelia Schmid. 2018. Actor-Centric Relation Network. In Computer Vision -- ECCV 2018, Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer International Publishing, Cham, 335--351.Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang. 2019. Deep High-Resolution Representation Learning for Human Pose Estimation. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Long Beach, CA, USA, 5686--5696. https://doi.org/10.1109/CVPR.2019.00584Google ScholarGoogle ScholarCross RefCross Ref
  95. Vishnu Priya Thotakura and Purnachand Nalluri. 2022. Convolutional 3D in Activity Recognition -A Review. In 2022 2nd International Conference on Artificial Intelligence and Signal Processing (AISP). IEEE, Vijayawada, India, 1--6. https://doi.org/10.1109/AISP53593.2022. 9760638Google ScholarGoogle ScholarCross RefCross Ref
  96. Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. 2018. A Closer Look at Spatiotemporal Convolutions for Action Recognition. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT, USA, 6450--6459. https://doi.org/10.1109/CVPR.2018.00675Google ScholarGoogle ScholarCross RefCross Ref
  97. Kimberly T. Tran, Lewis D. Griffin, Kevin Chetty, and Shelly Vishwakarma. 2020. Transfer Learning from Audio Deep Learning Models for Micro-Doppler Activity Recognition. In 2020 IEEE International Radar Conference (RADAR). IEEE, Washington, DC, USA, 584--589. https://doi.org/10.1109/RADAR42522.2020.9114643Google ScholarGoogle ScholarCross RefCross Ref
  98. Michalis Vrigkas, Christophoros Nikou, and Ioannis A Kakadiaris. 2015. A review of human activity recognition methods. Frontiers in Robotics and AI 2 (2015), 28.Google ScholarGoogle ScholarCross RefCross Ref
  99. Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2016. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition. In Computer Vision -- ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 20--36.Google ScholarGoogle ScholarCross RefCross Ref
  100. Pete Warden, Matthew Stewart, Brian Plancher, Colby Banbury, Shvetank Prakash, Emma Chen, Zain Asgar, Sachin Katti, and Vijay Janapa Reddi. 2022. Machine Learning Sensors. https://doi.org/10.48550/ARXIV.2206.03266Google ScholarGoogle ScholarCross RefCross Ref
  101. Chao-Yuan Wu, Christoph Feichtenhofer, Haoqi Fan, Kaiming He, Philipp Krahenbuhl, and Ross Girshick. 2019. Long-term feature banks for detailed video understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Long Beach, CA, USA, 284--293.Google ScholarGoogle ScholarCross RefCross Ref
  102. Tong Wu, Murtadha Aldeer, Tahiya Chowdhury, Amber Haynes, Fateme Nikseresht, Mahsa Pahlavikhah Varnosfaderani, Jiechao Gao, Arsalan Heydarian, Brad Campbell, and Jorge Ortiz. 2021. The Smart Building Privacy Challenge. In Proceedings of the 8th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation (Coimbra, Portugal) (BuildSys '21). Association for Computing Machinery, New York, NY, USA, 238--239. https://doi.org/10.1145/3486611.3492234Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence (AAAI'18/IAAI'18/EAAI'18). AAAI Press, New Orleans, Louisiana, USA, Article 912, 9 pages.Google ScholarGoogle Scholar
  104. Ceyuan Yang, Yinghao Xu, Jianping Shi, Bo Dai, and Bolei Zhou. 2020. Temporal Pyramid Network for Action Recognition. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA, USA, 588--597. https://doi.org/10.1109/CVPR42600.2020.00067Google ScholarGoogle ScholarCross RefCross Ref
  105. Deju Yang, Liangli Ma, and Fei Liao. 2019. An Intelligent Voice Interaction System Based on Raspberry Pi. In 2019 11th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), Vol. 1. IEEE, Hangzhou, China, 237--240. https://doi.org/10. 1109/IHMSC.2019.00062Google ScholarGoogle Scholar
  106. Yang Yang, Chunping Hou, Yue Lang, Dai Guan, Danyang Huang, and Jinchen Xu. 2019. Open-set human activity recognition based on micro-Doppler signatures. Pattern Recognition 85 (2019), 60--69.Google ScholarGoogle ScholarCross RefCross Ref
  107. Zhaoyuan Yang, Yang Zhao, and Weizhong Yan. 2020. Adversarial Vulnerability in Doppler-based Human Activity Recognition. In 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, Glasgow, UK, 1--7. https://doi.org/10.1109/IJCNN48605.2020.9207686Google ScholarGoogle ScholarCross RefCross Ref
  108. Bolei Zhou, Alex Andonian, Aude Oliva, and Antonio Torralba. 2018. Temporal Relational Reasoning in Videos. arXiv:1711.08496 [cs.CV]Google ScholarGoogle Scholar

Index Terms

  1. VAX: Using Existing Video and Audio-based Activity Recognition Models to Bootstrap Privacy-Sensitive Sensors

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
        Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies  Volume 7, Issue 3
        September 2023
        1734 pages
        EISSN:2474-9567
        DOI:10.1145/3626192
        Issue’s Table of Contents

        Copyright © 2023 Owner/Author

        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 27 September 2023
        Published in imwut Volume 7, Issue 3

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed
      • Article Metrics

        • Downloads (Last 12 months)353
        • Downloads (Last 6 weeks)60

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader