skip to main content
research-article

I Am an Earphone and I Can Hear My User’s Face: Facial Landmark Tracking Using Smart Earphones

Published:16 December 2023Publication History
Skip Abstract Section

Abstract

This article presents EARFace, a system that shows the feasibility of tracking facial landmarks for 3D facial reconstruction using in-ear acoustic sensors embedded within smart earphones. This enables a number of applications in the areas of facial expression tracking, user interfaces, AR/VR applications, affective computing, and accessibility, among others. Although conventional vision-based solutions break down under poor lighting and occlusions, and also suffer from privacy concerns, earphone platforms are robust to ambient conditions while being privacy-preserving. In contrast to prior work on earable platforms that perform outer-ear sensing for facial motion tracking, EARFace shows the feasibility of completely in-ear sensing with a natural earphone form factor, thus enhancing the comfort levels of wearing. The core intuition exploited by EARFace is that the shape of the ear canal changes due to the movement of facial muscles during facial motion. EARFace tracks the changes in shape of the ear canal by measuring ultrasonic channel frequency response of the inner ear, ultimately resulting in tracking of the facial motion. A transformer-based machine learning model is designed to exploit spectral and temporal relationships in the ultrasonic channel frequency response data to predict the facial landmarks of the user with an accuracy of 1.83 mm. Using these predicted landmarks, a 3D graphical model of the face that replicates the precise facial motion of the user is then reconstructed. Domain adaptation is further performed by adapting the weights of layers using a group-wise and differential learning rate. This decreases the training overhead in EARFace. The transformer-based machine learning model runs on smart phone devices with a processing latency of 13 ms and an overall low power consumption profile. Finally, usability studies indicate higher levels of comforts of wearing EARFace’s earphone platform in comparison with alternative form factors.

REFERENCES

  1. [1] Apple. 2023 Airpods. Retrieved August 12, 2023 from https://www.apple.com/airpods/Google ScholarGoogle Scholar
  2. [2] Rebecca. 2020. What Are True Wireless Stereo (TWS) Headphones? Retrieved August 12, 2023 from https://blog.taotronics.com/headphones/tws-headphones/Google ScholarGoogle Scholar
  3. [3] Fine Art America. 2023. Facial Muscle Anatomy. Retrieved August 12, 2023 from https://fineartamerica.com/featured/face-muscle-anatomy-maurizio-de-angelisscience-photo-library.htmlGoogle ScholarGoogle Scholar
  4. [4] Shennan Jennifer. n.d. Muscles of Facial Expression. Retrieved August 12, 2023 from https://geekymedics.com/muscles-of-facial-expression/Google ScholarGoogle Scholar
  5. [5] Android for Developers. n.d. Profile Battery Usage with Batterystats and Battery Historian. Retrieved August 12, 2023 from https://developer.android.com/topic/performance/power/setup-battery-historianGoogle ScholarGoogle Scholar
  6. [6] VRU. n.d. Sine Sweep Test. Retrieved August 12, 2023 from https://vru.vibrationresearch.com/lesson/sine-sweep-test/Google ScholarGoogle Scholar
  7. [7] CBC. 2023. The Seven Universal Emotions We Wear on Our Face. Retrieved August 12, 2023 from https://www.cbc.ca/natureofthings/features/the-seven-universal-emotions-we-wear-on-our-face#Google ScholarGoogle Scholar
  8. [8] Espressif Systems. 2023. ESP32. Retrieved August 12, 2023 from https://www.espressif.com/en/products/socs/esp32Google ScholarGoogle Scholar
  9. [9] Centers for Disease Control and Prevention. 1998. Criteria for a Recommended Standard: Occupational Noise Exposure. Retrieved August 12, 2023 from https://www.cdc.gov/niosh/docs/98-126/default.htmlGoogle ScholarGoogle Scholar
  10. [10] Sonion. 2022. EST65DB01. Retrieved August 12, 2023 from https://www.sonion.com/product/est65da01/Google ScholarGoogle Scholar
  11. [11] Sonion. 2022. Data Sheet: Microphone P11AC03. Retrieved August 12, 2023 from https://www.sonion.com/wp-content/uploads/ds-P11AC03_v3.pdfGoogle ScholarGoogle Scholar
  12. [12] SciPy Community. 2022. scipy.signal.chirp. Retrieved August 12, 2023 from https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.chirp.htmlGoogle ScholarGoogle Scholar
  13. [13] Texas Instruments. 2022. Low Power, Single-Supply, Rail-to-Rail Operational Amplifiers: MicroAmplifier Series. Retrieved August 12, 2023 from https://www.ti.com/lit/ds/symlink/opa344.pdfGoogle ScholarGoogle Scholar
  14. [14] PJRC. 2022. Audio Adaptor Boards for Teensy 3.x and Teensy 4.x. Retrieved August 12, 2023 from https://www.pjrc.com/store/teensy3_audio.htmlGoogle ScholarGoogle Scholar
  15. [15] PJRC. 2022. Teensyduino. Retrieved August 12, 2023 from https://www.pjrc.com/teensy/teensyduino.htmlGoogle ScholarGoogle Scholar
  16. [16] PJRC. 2022. Teensy 4.1 Development Board. Retrieved August 12, 2023 from https://www.pjrc.com/store/teensy41.htmlGoogle ScholarGoogle Scholar
  17. [17] GitHub. 2022. WiFiNINA Library for Arduino. Retrieved August 12, 2023 from https://github.com/adafruit/WiFiNINAGoogle ScholarGoogle Scholar
  18. [18] Abadi Martín et al. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI’16). 265283.Google ScholarGoogle Scholar
  19. [19] Ameri Shideh Kabiri, Kim Myungsoo, Kuang Irene Agnes, Perera Withanage K., Alshiekh Mohammed, Jeong Hyoyoung, Topcu Ufuk, Akinwande Deji, and Lu Nanshu. 2018. Imperceptible electrooculography graphene sensor system for human–robot interface. npj 2D Materials and Applications 2, 1 (2018), 17.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Amesaka Takashi, Watanabe Hiroki, and Sugimoto Masanori. 2019. Facial expression recognition using ear canal transfer function. In Proceedings of the 23rd International Symposium on Wearable Computers. 19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] An Sungtae, Medda Alessio, Sawka Michael N., Hutto Clayton J., Millard-Stafford Mindy L., Appling Scott, Richardson Kristine L. S., and Inan Omer T.. 2021. AdaptNet: Human activity recognition via bilateral domain adaptation using semi-supervised deep translation networks. IEEE Sensors Journal 21, 18 (2021), 2039820411.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Ando Toshiyuki, Kubo Yuki, Shizuki Buntarou, and Takahashi Shin. 2017. CanalSense: Face-related movement recognition system based on sensing air pressure in ear canals. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology. 679689.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Badrinarayanan Vijay, Kendall Alex, and Cipolla Roberto. 2017. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 12 (2017), 24812495.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Baltrusaitis Tadas, Zadeh Amir, Lim Yao Chong, and Morency Louis-Philippe. 2018. OpenFace 2.0: Facial behavior analysis toolkit. In Proceedings of the 2018 13th IEEE International Conference on Automatic Face and Gesture Recognition (FG’18). IEEE, Los Alamitos, CA, 5966.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Bodini Matteo. 2019. A review of facial landmark extraction in 2D images and videos using deep learning. Big Data and Cognitive Computing 3, 1 (2019), 14.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Botros Fady S., Phinyomark Angkoon, and Scheme Erik J.. 2022. Day-to-day stability of wrist EMG for wearable-based hand gesture recognition. IEEE Access 10 (2022), 125942125954.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Bui Nam, Pham Nhat, Barnitz Jessica Jacqueline, Zou Zhanan, Nguyen Phuc, Truong Hoang, Kim Taeho, Farrow Nicholas, Nguyen Anh, Xiao Jianliang, Robin Deterding, Thang Dinh, and Tam Vu. 2019. eBP: A wearable system for frequent and comfortable blood pressure monitoring from user’s ear. In Proceedings of the 25th Annual International Conference on Mobile Computing and Networking. 117.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Cao Chen, Weng Yanlin, Zhou Shun, Tong Yiying, and Zhou Kun. 2013. FaceWarehouse: A 3D facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics 20, 3 (2013), 413425.Google ScholarGoogle Scholar
  29. [29] Cao Gaoshuai, Yuan Kuang, Xiong Jie, Yang Panlong, Yan Yubo, Zhou Hao, and Li Xiang-Yang. 2020. Earphonetrack: Involving earphones into the ecosystem of acoustic motion tracking. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems. 95108.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Cao Zhe, Simon Tomas, Wei Shih-En, and Sheikh Yaser. 2017. Realtime multi-person 2D pose estimation using part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 72917299.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] William Chan, Navdeep Jaitly, Quoc Le, and Oriol Vinyals. 2016. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In Proceedings of the 2016 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’16). 49604964.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Chen Chien-Hsu, Lee I.-Jui, and Lin Ling-Yi. 2015. Augmented reality-based self-facial modeling to promote the emotional expression and social skills of adolescents with autism spectrum disorders. Research in Developmental Disabilities 36 (2015), 396403.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Chen Chien-Hsu, Lee I.-Jui, and Lin Ling-Yi. 2016. Augmented reality-based video-modeling storybook of nonverbal facial cues for children with autism spectrum disorder to improve their perceptions and judgments of facial expressions and emotions. Computers in Human Behavior 55 (2016), 477485.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Chen Tuochao, Steeper Benjamin, Alsheikh Kinan, Tao Songyun, Guimbretière François, and Zhang Cheng. 2020. C-Face: Continuously reconstructing facial expressions by deep learning contours of the face with ear-mounted miniature cameras. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. 112125.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Choi Seokmin, Gao Yang, Jin Yincheng, Kim Se Jun, Li Jiyang, Xu Wenyao, and Jin Zhanpeng. 2022. PPGface: Like what you are watching? Earphones can “feel” your facial expressions. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 2 (2022), 132.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Choudhury Romit Roy. 2021. Earable computing: A new area to think about. In Proceedings of the 22nd International Workshop on Mobile Computing Systems and Applications. 147153.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Streamable. 2022. Demo. Retrieved August 12, 2023 from https://streamable.com/t34w8lGoogle ScholarGoogle Scholar
  38. [38] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09).Google ScholarGoogle Scholar
  39. [39] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google ScholarGoogle Scholar
  40. [40] Dong Xuanyi, Yan Yan, Ouyang Wanli, and Yang Yi. 2018. Style aggregated network for facial landmark detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 379388.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Dosovitskiy Alexey, Beyer Lucas, Kolesnikov Alexander, Weissenborn Dirk, Zhai Xiaohua, Unterthiner Thomas, Dehghani Mostafa, Minderer Matthias, Heigold Georg, Gelly Sylvain, Jacob Uszkoreit, and Neil Houlsby. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).Google ScholarGoogle Scholar
  42. [42] Gao Yang, Jin Yincheng, Choi Seokmin, Li Jiyang, Pan Junjie, Shu Lin, Zhou Chi, and Jin Zhanpeng. 2021. SonicFace: Tracking facial expressions using a commodity microphone array. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 4 (2021), 133.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] TensorFlow. 2019. Home Page. Retrieved August 12, 2023 from https://www.tensorflow.org/liteGoogle ScholarGoogle Scholar
  44. [44] Goverdovsky Valentin, Rosenberg Wilhelm Von, Nakamura Takashi, Looney David, Sharp David J., Papavassiliou Christos, Morrell Mary J., and Mandic Danilo P.. 2017. Hearables: Multimodal physiological in-ear sensing. Scientific Reports 7, 1 (2017), 110.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Xiaobing Han, Yanfei Zhong, Liqin Cao, and Liangpei Zhang. 2017. Pre-trained AlexNet architecture with pyramid pooling and supervision for high spatial resolution remote sensing image scene classification. Remote Sensing 9, 8 (2017), 848.Google ScholarGoogle Scholar
  46. [46] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] He Sen, Liao Wentong, Tavakoli Hamed R., Yang Michael, Rosenhahn Bodo, and Pugeault Nicolas. 2020. Image captioning through image transformer. In Proceedings of the Asian Conference on Computer Vision.Google ScholarGoogle Scholar
  48. [48] Hiipakka Marko, Tikander Miikka, and Karjalainen Matti. 2010. Modeling the external ear acoustics for insert headphone usage. Journal of the Audio Engineering Society 58 (2010), 269281.Google ScholarGoogle Scholar
  49. [49] Howard Jeremy and Ruder Sebastian. 2018. Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146 (2018).Google ScholarGoogle Scholar
  50. [50] Hu Rong, Chen Ling, Miao Shenghuan, and Tang Xing. 2023. SWL-Adapt: An unsupervised domain adaptation model with sample weight learning for cross-user wearable human activity recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 60126020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Ishii Shun, Yokokubo Anna, Luimula Mika, and Lopez Guillaume. 2020. ExerSense: Physical exercise recognition and counting algorithm from wearables robust to positioning. Sensors 21, 1 (2020), 91.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Islam Md. Shafiqul, Hossain Tahera, Ahad Md. Atiqur Rahman, and Inoue Sozo. 2021. Exploring human activities using eSense earable device. In Activity and Behavior Computing. Springer, 169185.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Izmailov Pavel, Podoprikhin Dmitrii, Garipov Timur, Vetrov Dmitry, and Wilson Andrew Gordon. 2018. Averaging weights leads to wider optima and better generalization. arXiv preprint arXiv:1803.05407 (2018).Google ScholarGoogle Scholar
  54. [54] Jin Yincheng, Gao Yang, Xu Xuhai, Choi Seokmin, Li Jiyang, Liu Feng, Li Zhengxiong, and Jin Zhanpeng. 2022. EarCommand: “Hearing” your silent speech commands in ear. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 2 (2022), 128.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. [55] Katsigiannis Stamos and Ramzan Naeem. 2017. DREAMER: A database for emotion recognition through EEG and ECG signals from wireless low-cost off-the-shelf devices. IEEE Journal of Biomedical and Health Informatics 22, 1 (2017), 98107.Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Kawsar Fahim, Min Chulhong, Mathur Akhil, Montanari Alessandro, Acer Utku Günay, and Broeck Marc Van den. 2018. eSense: Open earable platform for human sensing. In Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems. 371372.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. [57] Kennedy Daniel P. and Adolphs Ralph. 2012. Perception of emotions from facial expressions in high-functioning adults with autism. Neuropsychologia 50, 14 (2012), 33133319.Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Khan Md. Abdullah Al Hafiz, Roy Nirmalya, and Misra Archan. 2018. Scaling human activity recognition via deep learning-based domain adaptation. In Proceedings of the 2018 IEEE International Conference on Pervasive Computing and Communications (PerCom’18). IEEE, Los Alamitos, CA, 19.Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Khanna Prerna, Srivastava Tanmay, Pan Shijia, Jain Shubham, and Nguyen Phuc. 2021. JawSense: Recognizing unvoiced sound using a low-cost ear-worn system. In Proceedings of the 22nd International Workshop on Mobile Computing Systems and Applications. 4449.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. [60] Kim Jaeyoung, El-Khamy Mostafa, and Lee Jungwon. 2020. T-GSA: Transformer with Gaussian-weighted self-attention for speech enhancement. In Proceedings of the 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’20). IEEE, Los Alamitos, CA, 66496653.Google ScholarGoogle ScholarCross RefCross Ref
  61. [61] Kingma Diederik P. and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  62. [62] Kinnunen Tomi, Evgenia Chernenko, Marko Tuononen, Pasi Franti, and Haizhou Li. 2012. Voice activity detection using MFCC features and support vector machine. In Proceedings of the XII International Conference on Speech and Computer (SPECOM’07).Google ScholarGoogle Scholar
  63. [63] Krizhevsky Alex, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS’12). 1097–1105.Google ScholarGoogle Scholar
  64. [64] Kwon Hyeokhyen, Tong Catherine, Haresamudram Harish, Gao Yan, Abowd Gregory D., Lane Nicholas D., and Ploetz Thomas. 2020. IMUTube: Automatic extraction of virtual on-body accelerometry from video for human activity recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 3 (2020), 129.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. [65] Lattas Alexandros, Moschoglou Stylianos, Gecer Baris, Ploumpis Stylianos, Triantafyllou Vasileios, Ghosh Abhijeet, and Zafeiriou Stefanos. 2020. AvatarMe: Realistically renderable 3D facial reconstruction “in-the-wild.” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 760769.Google ScholarGoogle ScholarCross RefCross Ref
  66. [66] Lee Seulki, Islam Bashima, Luo Yubo, and Nirjon Shahriar. 2019. Intermittent learning: On-device machine learning on intermittently powered system. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 4 (2019), 130.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. [67] Lee Won-Sook, Kalra Prem, and Magnenat-Thalmann Nadia. 1997. Model based face reconstruction for animation. In Proceedings of the International Multimedia Modeling Conference (MMM’97). 323–338.Google ScholarGoogle Scholar
  68. [68] Li Guang, Zhu Linchao, Liu Ping, and Yang Yi. 2019. Entangled transformer for image captioning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 89288937.Google ScholarGoogle ScholarCross RefCross Ref
  69. [69] Li Ke, Zhang Ruidong, Liang Bo, Guimbretière François, and Zhang Cheng. 2022. EarIO: A low-power acoustic sensing earable for continuously tracking detailed facial movements. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 2 (2022), 124.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. [70] Li Tianye, Bolkart Timo, Black Michael J., Li Hao, and Romero Javier. 2017. Learning a model of facial shape and expression from 4D scans. ACM Transactions on Graphics 36, 6 (2017), Article 194, 17 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. [71] Liang Zhengyu, Wang Yingqian, Wang Longguang, Yang Jungang, and Zhou Shilin. 2022. Light field image super-resolution with transformers. IEEE Signal Processing Letters 29 (2022), 563567.Google ScholarGoogle ScholarCross RefCross Ref
  72. [72] Liu Yilin, Zhang Shijia, and Gowda Mahanth. 2021. NeuroPose: 3D hand pose tracking using EMG wearables. In Proceedings of the Web Conference 2021. 14711482.Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. [73] Liu Yilin, Zhang Shijia, and Gowda Mahanth. 2021. When video meets inertial sensors: Zero-shot domain adaptation for finger motion analytics with inertial sensors. In Proceedings of the International Conference on Internet-of-Things Design and Implementation. 182194.Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. [74] Liu Yilin, Zhang Shijia, and Gowda Mahanth. 2022. A practical system for 3-D hand pose tracking using EMG wearables with applications to prosthetics and user interfaces. IEEE Internet of Things Journal 10, 4 (2022), 34073427.Google ScholarGoogle ScholarCross RefCross Ref
  75. [75] Lotfi Roya, Tzanetakis George, Eskicioglu Rasit, and Irani Pourang. 2020. A comparison between audio and IMU data to detect chewing events based on an earable device. In Proceedings of the 11th Augmented Human International Conference. 18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. [76] Luo Yubo and Huang Yongfeng. 2017. Text steganography with high embedding rate: Using recurrent neural networks to generate Chinese classic poetry. In Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security. 99104.Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. [77] Luo Yubo, Huang Yongfeng, Li Fufang, and Chang Chinchen. 2016. Text steganography based on Ci-poetry generation using Markov chain model. KSII Transactions on Internet and Information Systems 10, 9 (2016), 45684584.Google ScholarGoogle Scholar
  78. [78] Luo Yubo, Zhang Le, Wang Zhenyu, and Nirjon Shahriar. 2023. Efficient multitask learning on resource-constrained systems. arXiv preprint arXiv:2302.13155 (2023).Google ScholarGoogle Scholar
  79. [79] Maag Balz, Zhou Zimu, Saukh Olga, and Thiele Lothar. 2017. BARTON: Low power tongue movement sensing with in-ear barometers. In Proceedings of the 2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS’17). IEEE, Los Alamitos, CA, 916.Google ScholarGoogle ScholarCross RefCross Ref
  80. [80] Masai Katsutoshi, Kunze Kai, Sakamoto Daisuke, Sugiura Yuta, and Sugimoto Maki. 2020. Face commands-user-defined facial gestures for smart glasses. In Proceedings of the 2020 IEEE International Symposium on Mixed and Augmented Reality (ISMAR’20). IEEE, Los Alamitos, CA, 374386.Google ScholarGoogle ScholarCross RefCross Ref
  81. [81] Masai Katsutoshi, Sugiura Yuta, Ogata Masa, Kunze Kai, Inami Masahiko, and Sugimoto Maki. 2016. Facial expression recognition in daily life by embedded photo reflective sensors on smart eyewear. In Proceedings of the 21st International Conference on Intelligent User Interfaces. 317326.Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. [82] Matthies Denys J. C., Strecker Bernhard A., and Urban Bodo. 2017. EarFieldSensing: A novel in-ear electric field sensing to enrich wearable gesture input through facial expressions. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 19111922.Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. [83] Matthies Denys J. C., Strecker Bernhard A., and Urban Bodo. 2017. EarFieldSensing: A novel in-ear electric field sensing to enrich wearable gesture input through facial expressions. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 19111922.Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. [84] Matthies Denys J. C., Weerasinghe Chamod, Urban Bodo, and Nanayakkara Suranga. 2021. CapGlasses: Untethered capacitive sensing with smart glasses. In Proceedings of the 2021 Augmented Humans Conference. 121130.Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. [85] Mohamed Abdelrahman, Okhonko Dmytro, and Zettlemoyer Luke. 2019. Transformers with convolutional context for ASR. arXiv preprint arXiv:1904.11660 (2019).Google ScholarGoogle Scholar
  86. [86] Murata Aiko, Saito Hisamichi, Schug Joanna, Ogawa Kenji, and Kameda Tatsuya. 2016. Spontaneous facial mimicry is enhanced by the goal of inferring emotional states: Evidence for moderation of “automatic” mimicry by higher cognitive processes. PLoS One 11, 4 (2016), e0153128.Google ScholarGoogle ScholarCross RefCross Ref
  87. [87] Nawaz Wajahat, Sagheer Ahmed, Ali Tahir, and Hassan Aqeel Khan. 2018. Classification of breast cancer histology images using AlexNet. In Image Analysis and Recognition. Lecture Notes in Computer Science, Vol. 10882. Springer, 869–876.Google ScholarGoogle Scholar
  88. [88] Abatement U.S. Office of Noise. 1974. Information on Levels of Environmental Noise Requisite to Protect Public Health and Welfare with an Adequate Margin of Safety. Number 2115. U.S. Government Printing Office.Google ScholarGoogle Scholar
  89. [89] OnePlus. 2023. OnePlus 9 Pro. Retrieved August 12, 2023 from https://www.oneplus.com/us/9-proGoogle ScholarGoogle Scholar
  90. [90] Pavlakos Georgios, Choutas Vasileios, Ghorbani Nima, Bolkart Timo, Osman Ahmed A. A., Tzionas Dimitrios, and Black Michael J.. 2019. Expressive body capture: 3D hands, face, and body from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1097510985.Google ScholarGoogle ScholarCross RefCross Ref
  91. [91] Paysan Pascal, Knothe Reinhard, Amberg Brian, Romdhani Sami, and Vetter Thomas. 2009. A 3D face model for pose and illumination invariant face recognition. In Proceedings of the 2009 6th IEEE International Conference on Advanced Video and Signal Based Surveillance. IEEE, Los Alamitos, CA, 296301.Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. [92] Prakash Jay, Yang Zhijian, Wei Yu-Lin, Hassanieh Haitham, and Choudhury Romit Roy. 2020. EarSense: Earphones as a teeth activity sensor. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking. 113.Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. [93] Qi JunRong. 2016. Cross-Correlation between Mandibular Condylar Movements and Distortion of External Auditory Meatus.Ph. D. dissertation. Matsumoto Dental University. https://ci.nii.ac.jp/naid/500000981228Google ScholarGoogle Scholar
  94. [94] Qu Chen, Liu Yang, Minghui Qiu, W. Bruce Croft, Yongfeng Zhang, and Mohit Iyyer. 2019. BERT with history answer embedding for conversational question answering. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 1133–1136.Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. [95] Rabhi Yassine, Mrabet Makrem, and Fnaiech Farhat. 2018. A facial expression controlled wheelchair for people with disabilities. Computer Methods and Programs in Biomedicine 165 (2018), 89105.Google ScholarGoogle ScholarCross RefCross Ref
  96. [96] Samsung. n.d. Africa: Samsung Galaxy Note 20. https://www.samsung.com/africa_en/smartphones/galaxy-note20/models/Google ScholarGoogle Scholar
  97. [97] Sanyal Soubhik, Bolkart Timo, Feng Haiwen, and Black Michael J.. 2019. Learning to regress 3D face shape and expression from an image without 3D supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 77637772.Google ScholarGoogle ScholarCross RefCross Ref
  98. [98] Shen Fangyao, Peng Yong, Kong Wanzeng, and Dai Guojun. 2021. Multi-scale frequency bands ensemble learning for EEG-based emotion recognition. Sensors 21, 4 (2021), 1262.Google ScholarGoogle ScholarCross RefCross Ref
  99. [99] Song Xingzhe, Huang Kai, and Gao Wei. 2022. FaceListener: Recognizing human facial expressions via acoustic sensing on commodity headphones. In Proceedings of the 2022 21st ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN’22).Google ScholarGoogle Scholar
  100. [100] Srivastava Tanmay, Khanna Prerna, Pan Shijia, Nguyen Phuc, and Jain Shubham. 2022. MuteIt: Jaw motion based unvoiced command recognition using earable. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 3 (2022), 126.Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. [101] Stoffregen Paul. 2022. Teensy Audio Implementation Library. Retrieved August 12, 2023 from https://github.com/PaulStoffregen/AudioGoogle ScholarGoogle Scholar
  102. [102] Tagliasacchi Marco, Li Yunpeng, Misiunas Karolis, and Roblek Dominik. 2020. SEANet: A multi-modal speech enhancement network. arXiv preprint arXiv:2009.02095 (2020).Google ScholarGoogle Scholar
  103. [103] Taigman Yaniv, Yang Ming, Ranzato Marc’Aurelio, and Wolf Lior. 2014. DeepFace: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 17011708.Google ScholarGoogle ScholarDigital LibraryDigital Library
  104. [104] Takayuki Arakawa. 2019. Ear acoustic authentication technology: Using sound to identify the distinctive shape of the ear canal. NEC Technical Journal—Special Issue on Social Value Creation Using Biometrics 13, 2 (2019), 8790.Google ScholarGoogle Scholar
  105. [105] Tesfamikael Hadish Habte, Fray Adam, Mengsteab Israel, Semere Adonay, and Amanuel Zebib. 2021. Simulation of eye tracking control based electric wheelchair construction by image segmentation algorithm. Journal of Innovative Image Processing 3, 01 (2021), 2135.Google ScholarGoogle ScholarCross RefCross Ref
  106. [106] Torvi Vishwas G., Bhattacharya Aditya, and Chakraborty Shayok. 2018. Deep domain adaptation to predict freezing of gait in patients with Parkinson’s disease. In Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA’18). IEEE, Los Alamitos, CA, 10011006.Google ScholarGoogle Scholar
  107. [107] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. In Advances in Neural Information Processing Systems, Guyon I., Luxburg U. V., Bengio S., Wallach H., Fergus R., Vishwanathan S., and Garnett R. (Eds.), Vol. 30. Curran Associates, 1–11. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdfGoogle ScholarGoogle Scholar
  108. [108] Verma Dhruv, Bhalla Sejal, Sahnan Dhruv, Shukla Jainendra, and Parnami Aman. 2021. ExpressEar: Sensing fine-grained facial expressions with earables. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 3 (2021), 128.Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. [109] Villanueva Pedro. 2022. Teensy ADC Implementation Library. Retrieved August 12, 2023 from https://github.com/pedvide/ADCGoogle ScholarGoogle Scholar
  110. [110] Wager Stefan, Sida Wang, and Percy Liang. 2013. Dropout training as adaptive regularization. In Advances in Neural Information Processing Systems.Google ScholarGoogle Scholar
  111. [111] White Susan W., Mazefsky Carla A., Dichter Gabriel S., Chiu Pearl H., Richey John A., and Ollendick Thomas H.. 2014. Social-cognitive, physiological, and neural mechanisms underlying emotion regulation impairments: Understanding anxiety in autism spectrum disorder. International Journal of Developmental Neuroscience 39 (2014), 2236.Google ScholarGoogle ScholarCross RefCross Ref
  112. [112] Wu Yi, Kakaraparthi Vimal, Li Zhuohang, Pham Tien, Liu Jian, and Nguyen Phuc. 2021. BioFace-3D: Continuous 3D facial reconstruction through lightweight single-ear biosensors. In Proceedings of the 27th Annual International Conference on Mobile Computing and Networking. 350363.Google ScholarGoogle ScholarDigital LibraryDigital Library
  113. [113] Xiong Xuehan and Torre Fernando De la. 2013. Supervised descent method and its applications to face alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 532539.Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. [114] Yang Fuzhi, Yang Huan, Fu Jianlong, Lu Hongtao, and Guo Baining. 2020. Learning texture transformer network for image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 57915800.Google ScholarGoogle ScholarCross RefCross Ref
  115. [115] Yang Zhijian, Wei Yu-Lin, Shen Sheng, and Choudhury Romit Roy. 2020. Ear-AR: Indoor acoustic augmented reality on earphones. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking. 114.Google ScholarGoogle ScholarDigital LibraryDigital Library
  116. [116] Yu Weiwei, Zhou Jian, Wang HuaBin, and Tao Liang. 2022. SETransformer: Speech enhancement transformer. Cognitive Computation 14, 3 (2022), 11521158.Google ScholarGoogle ScholarCross RefCross Ref
  117. [117] Zhang Haoyu, Xu Jianjun, and Wang Ji. 2019. Pretraining-based natural language generation for text summarization. arXiv preprint arXiv:1902.09243 (2019).Google ScholarGoogle Scholar
  118. [118] Zhang Ligang and Tjondronegoro Dian. 2011. Facial expression recognition using facial movement features. IEEE Transactions on Affective Computing 2, 4 (2011), 219229. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  119. [119] Zhang Shijia, Liu Yilin, and Gowda Mahanth. 2022. Let’s grab a drink: Teacher-student learning for fluid intake monitoring using smart earphones. In Proceedings of the 2022 IEEE/ACM 7th International Conference on Internet-of-Things Design and Implementation (IoTDI’22). IEEE, Los Alamitos, CA, 5566.Google ScholarGoogle Scholar
  120. [120] Zhang Shijia, Liu Yilin, and Gowda Mahanth. 2023. I spy you: Eavesdropping continuous speech on smartphones via motion sensors. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 4 (2023), 131.Google ScholarGoogle Scholar
  121. [121] Zhang Tianyi, Wu Felix, Katiyar Arzoo, Weinberger Kilian Q., and Artzi Yoav. 2020. Revisiting few-sample BERT fine-tuning. arXiv preprint arXiv:2006.05987 (2020).Google ScholarGoogle Scholar
  122. [122] Zhou Xinhui, Garcia-Romero Daniel, Duraiswami Ramani, Espy-Wilson Carol, and Shamma Shihab. 2011. Linear versus mel frequency cepstral coefficients for speaker recognition. In Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition and Understanding. IEEE, Los Alamitos, CA, 559564.Google ScholarGoogle ScholarCross RefCross Ref
  123. [123] Zhou Zongwei, Jae Shin, Lei Zhang, Suryakanth Gurudu, Michael Gotway, and Jianming Liang. 2017. Fine-tuning convolutional neural networks for biomedical image analysis: Actively and incrementally. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).Google ScholarGoogle ScholarCross RefCross Ref
  124. [124] Zhu Shizhan, Li Cheng, Loy Chen Change, and Tang Xiaoou. 2015. Face alignment by coarse-to-fine shape searching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 49985006.Google ScholarGoogle Scholar

Index Terms

  1. I Am an Earphone and I Can Hear My User’s Face: Facial Landmark Tracking Using Smart Earphones

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Internet of Things
        ACM Transactions on Internet of Things  Volume 5, Issue 1
        February 2024
        181 pages
        EISSN:2577-6207
        DOI:10.1145/3613526
        • Editor:
        • Gian Pietro Picco
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 16 December 2023
        • Online AM: 9 August 2023
        • Accepted: 24 July 2023
        • Revised: 8 April 2023
        • Received: 16 October 2022
        Published in tiot Volume 5, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
      • Article Metrics

        • Downloads (Last 12 months)614
        • Downloads (Last 6 weeks)66

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text