Abstract
This article presents EARFace, a system that shows the feasibility of tracking facial landmarks for 3D facial reconstruction using in-ear acoustic sensors embedded within smart earphones. This enables a number of applications in the areas of facial expression tracking, user interfaces, AR/VR applications, affective computing, and accessibility, among others. Although conventional vision-based solutions break down under poor lighting and occlusions, and also suffer from privacy concerns, earphone platforms are robust to ambient conditions while being privacy-preserving. In contrast to prior work on earable platforms that perform outer-ear sensing for facial motion tracking, EARFace shows the feasibility of completely in-ear sensing with a natural earphone form factor, thus enhancing the comfort levels of wearing. The core intuition exploited by EARFace is that the shape of the ear canal changes due to the movement of facial muscles during facial motion. EARFace tracks the changes in shape of the ear canal by measuring ultrasonic channel frequency response of the inner ear, ultimately resulting in tracking of the facial motion. A transformer-based machine learning model is designed to exploit spectral and temporal relationships in the ultrasonic channel frequency response data to predict the facial landmarks of the user with an accuracy of 1.83 mm. Using these predicted landmarks, a 3D graphical model of the face that replicates the precise facial motion of the user is then reconstructed. Domain adaptation is further performed by adapting the weights of layers using a group-wise and differential learning rate. This decreases the training overhead in EARFace. The transformer-based machine learning model runs on smart phone devices with a processing latency of 13 ms and an overall low power consumption profile. Finally, usability studies indicate higher levels of comforts of wearing EARFace’s earphone platform in comparison with alternative form factors.
- [1] . 2023 Airpods. Retrieved August 12, 2023 from https://www.apple.com/airpods/Google Scholar
- [2] . 2020. What Are True Wireless Stereo (TWS) Headphones? Retrieved August 12, 2023 from https://blog.taotronics.com/headphones/tws-headphones/Google Scholar
- [3] . 2023. Facial Muscle Anatomy. Retrieved August 12, 2023 from https://fineartamerica.com/featured/face-muscle-anatomy-maurizio-de-angelisscience-photo-library.htmlGoogle Scholar
- [4] . n.d. Muscles of Facial Expression. Retrieved August 12, 2023 from https://geekymedics.com/muscles-of-facial-expression/Google Scholar
- [5] . n.d. Profile Battery Usage with Batterystats and Battery Historian. Retrieved August 12, 2023 from https://developer.android.com/topic/performance/power/setup-battery-historianGoogle Scholar
- [6] . n.d. Sine Sweep Test. Retrieved August 12, 2023 from https://vru.vibrationresearch.com/lesson/sine-sweep-test/Google Scholar
- [7] . 2023. The Seven Universal Emotions We Wear on Our Face. Retrieved August 12, 2023 from https://www.cbc.ca/natureofthings/features/the-seven-universal-emotions-we-wear-on-our-face#Google Scholar
- [8] . 2023. ESP32. Retrieved August 12, 2023 from https://www.espressif.com/en/products/socs/esp32Google Scholar
- [9] . 1998. Criteria for a Recommended Standard: Occupational Noise Exposure. Retrieved August 12, 2023 from https://www.cdc.gov/niosh/docs/98-126/default.htmlGoogle Scholar
- [10] . 2022. EST65DB01. Retrieved August 12, 2023 from https://www.sonion.com/product/est65da01/Google Scholar
- [11] . 2022. Data Sheet: Microphone P11AC03. Retrieved August 12, 2023 from https://www.sonion.com/wp-content/uploads/ds-P11AC03_v3.pdfGoogle Scholar
- [12] . 2022. scipy.signal.chirp. Retrieved August 12, 2023 from https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.chirp.htmlGoogle Scholar
- [13] . 2022. Low Power, Single-Supply, Rail-to-Rail Operational Amplifiers: MicroAmplifier Series. Retrieved August 12, 2023 from https://www.ti.com/lit/ds/symlink/opa344.pdfGoogle Scholar
- [14] . 2022. Audio Adaptor Boards for Teensy 3.x and Teensy 4.x. Retrieved August 12, 2023 from https://www.pjrc.com/store/teensy3_audio.htmlGoogle Scholar
- [15] . 2022. Teensyduino. Retrieved August 12, 2023 from https://www.pjrc.com/teensy/teensyduino.htmlGoogle Scholar
- [16] . 2022. Teensy 4.1 Development Board. Retrieved August 12, 2023 from https://www.pjrc.com/store/teensy41.htmlGoogle Scholar
- [17] . 2022. WiFiNINA Library for Arduino. Retrieved August 12, 2023 from https://github.com/adafruit/WiFiNINAGoogle Scholar
- [18] . 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI’16). 265–283.Google Scholar
- [19] . 2018. Imperceptible electrooculography graphene sensor system for human–robot interface. npj 2D Materials and Applications 2, 1 (2018), 1–7.Google ScholarCross Ref
- [20] . 2019. Facial expression recognition using ear canal transfer function. In Proceedings of the 23rd International Symposium on Wearable Computers. 1–9.Google ScholarDigital Library
- [21] . 2021. AdaptNet: Human activity recognition via bilateral domain adaptation using semi-supervised deep translation networks. IEEE Sensors Journal 21, 18 (2021), 20398–20411.Google ScholarCross Ref
- [22] . 2017. CanalSense: Face-related movement recognition system based on sensing air pressure in ear canals. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology. 679–689.Google ScholarDigital Library
- [23] . 2017. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 12 (2017), 2481–2495.Google ScholarCross Ref
- [24] . 2018. OpenFace 2.0: Facial behavior analysis toolkit. In Proceedings of the 2018 13th IEEE International Conference on Automatic Face and Gesture Recognition (FG’18). IEEE, Los Alamitos, CA, 59–66.Google ScholarDigital Library
- [25] . 2019. A review of facial landmark extraction in 2D images and videos using deep learning. Big Data and Cognitive Computing 3, 1 (2019), 14.Google ScholarCross Ref
- [26] . 2022. Day-to-day stability of wrist EMG for wearable-based hand gesture recognition. IEEE Access 10 (2022), 125942–125954.Google ScholarCross Ref
- [27] . 2019. eBP: A wearable system for frequent and comfortable blood pressure monitoring from user’s ear. In Proceedings of the 25th Annual International Conference on Mobile Computing and Networking. 1–17.Google ScholarDigital Library
- [28] . 2013. FaceWarehouse: A 3D facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics 20, 3 (2013), 413–425.Google Scholar
- [29] . 2020. Earphonetrack: Involving earphones into the ecosystem of acoustic motion tracking. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems. 95–108.Google ScholarDigital Library
- [30] . 2017. Realtime multi-person 2D pose estimation using part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7291–7299.Google ScholarCross Ref
- [31] . 2016. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In Proceedings of the 2016 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’16). 4960–4964.Google ScholarDigital Library
- [32] . 2015. Augmented reality-based self-facial modeling to promote the emotional expression and social skills of adolescents with autism spectrum disorders. Research in Developmental Disabilities 36 (2015), 396–403.Google ScholarCross Ref
- [33] . 2016. Augmented reality-based video-modeling storybook of nonverbal facial cues for children with autism spectrum disorder to improve their perceptions and judgments of facial expressions and emotions. Computers in Human Behavior 55 (2016), 477–485.Google ScholarDigital Library
- [34] . 2020. C-Face: Continuously reconstructing facial expressions by deep learning contours of the face with ear-mounted miniature cameras. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. 112–125.Google ScholarDigital Library
- [35] . 2022. PPGface: Like what you are watching? Earphones can “feel” your facial expressions. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 2 (2022), 1–32.Google ScholarDigital Library
- [36] . 2021. Earable computing: A new area to think about. In Proceedings of the 22nd International Workshop on Mobile Computing Systems and Applications. 147–153.Google ScholarDigital Library
- [37] . 2022. Demo. Retrieved August 12, 2023 from https://streamable.com/t34w8lGoogle Scholar
- [38] . 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09).Google Scholar
- [39] . 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
- [40] . 2018. Style aggregated network for facial landmark detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 379–388.Google ScholarCross Ref
- [41] . 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).Google Scholar
- [42] . 2021. SonicFace: Tracking facial expressions using a commodity microphone array. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 4 (2021), 1–33.Google ScholarDigital Library
- [43] . 2019. Home Page. Retrieved August 12, 2023 from https://www.tensorflow.org/liteGoogle Scholar
- [44] . 2017. Hearables: Multimodal physiological in-ear sensing. Scientific Reports 7, 1 (2017), 1–10.Google ScholarCross Ref
- [45] . 2017. Pre-trained AlexNet architecture with pyramid pooling and supervision for high spatial resolution remote sensing image scene classification. Remote Sensing 9, 8 (2017), 848.Google Scholar
- [46] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google ScholarCross Ref
- [47] . 2020. Image captioning through image transformer. In Proceedings of the Asian Conference on Computer Vision.Google Scholar
- [48] . 2010. Modeling the external ear acoustics for insert headphone usage. Journal of the Audio Engineering Society 58 (2010), 269–281.Google Scholar
- [49] . 2018. Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146 (2018).Google Scholar
- [50] . 2023. SWL-Adapt: An unsupervised domain adaptation model with sample weight learning for cross-user wearable human activity recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 6012–6020.Google ScholarDigital Library
- [51] . 2020. ExerSense: Physical exercise recognition and counting algorithm from wearables robust to positioning. Sensors 21, 1 (2020), 91.Google ScholarCross Ref
- [52] . 2021. Exploring human activities using eSense earable device. In Activity and Behavior Computing. Springer, 169–185.Google ScholarCross Ref
- [53] . 2018. Averaging weights leads to wider optima and better generalization. arXiv preprint arXiv:1803.05407 (2018).Google Scholar
- [54] . 2022. EarCommand: “Hearing” your silent speech commands in ear. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 2 (2022), 1–28.Google ScholarDigital Library
- [55] . 2017. DREAMER: A database for emotion recognition through EEG and ECG signals from wireless low-cost off-the-shelf devices. IEEE Journal of Biomedical and Health Informatics 22, 1 (2017), 98–107.Google ScholarCross Ref
- [56] . 2018. eSense: Open earable platform for human sensing. In Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems. 371–372.Google ScholarDigital Library
- [57] . 2012. Perception of emotions from facial expressions in high-functioning adults with autism. Neuropsychologia 50, 14 (2012), 3313–3319.Google ScholarCross Ref
- [58] . 2018. Scaling human activity recognition via deep learning-based domain adaptation. In Proceedings of the 2018 IEEE International Conference on Pervasive Computing and Communications (PerCom’18). IEEE, Los Alamitos, CA, 1–9.Google ScholarCross Ref
- [59] . 2021. JawSense: Recognizing unvoiced sound using a low-cost ear-worn system. In Proceedings of the 22nd International Workshop on Mobile Computing Systems and Applications. 44–49.Google ScholarDigital Library
- [60] . 2020. T-GSA: Transformer with Gaussian-weighted self-attention for speech enhancement. In Proceedings of the 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’20). IEEE, Los Alamitos, CA, 6649–6653.Google ScholarCross Ref
- [61] . 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
- [62] . 2012. Voice activity detection using MFCC features and support vector machine. In Proceedings of the XII International Conference on Speech and Computer (SPECOM’07).Google Scholar
- [63] . 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS’12). 1097–1105.Google Scholar
- [64] . 2020. IMUTube: Automatic extraction of virtual on-body accelerometry from video for human activity recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 3 (2020), 1–29.Google ScholarDigital Library
- [65] . 2020. AvatarMe: Realistically renderable 3D facial reconstruction “in-the-wild.” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 760–769.Google ScholarCross Ref
- [66] . 2019. Intermittent learning: On-device machine learning on intermittently powered system. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 4 (2019), 1–30.Google ScholarDigital Library
- [67] . 1997. Model based face reconstruction for animation. In Proceedings of the International Multimedia Modeling Conference (MMM’97). 323–338.Google Scholar
- [68] . 2019. Entangled transformer for image captioning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8928–8937.Google ScholarCross Ref
- [69] . 2022. EarIO: A low-power acoustic sensing earable for continuously tracking detailed facial movements. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 2 (2022), 1–24.Google ScholarDigital Library
- [70] . 2017. Learning a model of facial shape and expression from 4D scans. ACM Transactions on Graphics 36, 6 (2017), Article 194, 17 pages.Google ScholarDigital Library
- [71] . 2022. Light field image super-resolution with transformers. IEEE Signal Processing Letters 29 (2022), 563–567.Google ScholarCross Ref
- [72] . 2021. NeuroPose: 3D hand pose tracking using EMG wearables. In Proceedings of the Web Conference 2021. 1471–1482.Google ScholarDigital Library
- [73] . 2021. When video meets inertial sensors: Zero-shot domain adaptation for finger motion analytics with inertial sensors. In Proceedings of the International Conference on Internet-of-Things Design and Implementation. 182–194.Google ScholarDigital Library
- [74] . 2022. A practical system for 3-D hand pose tracking using EMG wearables with applications to prosthetics and user interfaces. IEEE Internet of Things Journal 10, 4 (2022), 3407–3427.Google ScholarCross Ref
- [75] . 2020. A comparison between audio and IMU data to detect chewing events based on an earable device. In Proceedings of the 11th Augmented Human International Conference. 1–8.Google ScholarDigital Library
- [76] . 2017. Text steganography with high embedding rate: Using recurrent neural networks to generate Chinese classic poetry. In Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security. 99–104.Google ScholarDigital Library
- [77] . 2016. Text steganography based on Ci-poetry generation using Markov chain model. KSII Transactions on Internet and Information Systems 10, 9 (2016), 4568–4584.Google Scholar
- [78] . 2023. Efficient multitask learning on resource-constrained systems. arXiv preprint arXiv:2302.13155 (2023).Google Scholar
- [79] . 2017. BARTON: Low power tongue movement sensing with in-ear barometers. In Proceedings of the 2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS’17). IEEE, Los Alamitos, CA, 9–16.Google ScholarCross Ref
- [80] . 2020. Face commands-user-defined facial gestures for smart glasses. In Proceedings of the 2020 IEEE International Symposium on Mixed and Augmented Reality (ISMAR’20). IEEE, Los Alamitos, CA, 374–386.Google ScholarCross Ref
- [81] . 2016. Facial expression recognition in daily life by embedded photo reflective sensors on smart eyewear. In Proceedings of the 21st International Conference on Intelligent User Interfaces. 317–326.Google ScholarDigital Library
- [82] . 2017. EarFieldSensing: A novel in-ear electric field sensing to enrich wearable gesture input through facial expressions. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 1911–1922.Google ScholarDigital Library
- [83] . 2017. EarFieldSensing: A novel in-ear electric field sensing to enrich wearable gesture input through facial expressions. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 1911–1922.Google ScholarDigital Library
- [84] . 2021. CapGlasses: Untethered capacitive sensing with smart glasses. In Proceedings of the 2021 Augmented Humans Conference. 121–130.Google ScholarDigital Library
- [85] . 2019. Transformers with convolutional context for ASR. arXiv preprint arXiv:1904.11660 (2019).Google Scholar
- [86] . 2016. Spontaneous facial mimicry is enhanced by the goal of inferring emotional states: Evidence for moderation of “automatic” mimicry by higher cognitive processes. PLoS One 11, 4 (2016), e0153128.Google ScholarCross Ref
- [87] . 2018. Classification of breast cancer histology images using AlexNet. In Image Analysis and Recognition. Lecture Notes in Computer Science, Vol. 10882. Springer, 869–876.Google Scholar
- [88] . 1974. Information on Levels of Environmental Noise Requisite to Protect Public Health and Welfare with an Adequate Margin of Safety. Number 2115. U.S. Government Printing Office.Google Scholar
- [89] . 2023. OnePlus 9 Pro. Retrieved August 12, 2023 from https://www.oneplus.com/us/9-proGoogle Scholar
- [90] . 2019. Expressive body capture: 3D hands, face, and body from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10975–10985.Google ScholarCross Ref
- [91] . 2009. A 3D face model for pose and illumination invariant face recognition. In Proceedings of the 2009 6th IEEE International Conference on Advanced Video and Signal Based Surveillance. IEEE, Los Alamitos, CA, 296–301.Google ScholarDigital Library
- [92] . 2020. EarSense: Earphones as a teeth activity sensor. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking. 1–13.Google ScholarDigital Library
- [93] . 2016. Cross-Correlation between Mandibular Condylar Movements and Distortion of External Auditory Meatus.Ph. D. dissertation. Matsumoto Dental University. https://ci.nii.ac.jp/naid/500000981228Google Scholar
- [94] . 2019. BERT with history answer embedding for conversational question answering. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 1133–1136.Google ScholarDigital Library
- [95] . 2018. A facial expression controlled wheelchair for people with disabilities. Computer Methods and Programs in Biomedicine 165 (2018), 89–105.Google ScholarCross Ref
- [96] . n.d. Africa: Samsung Galaxy Note 20. https://www.samsung.com/africa_en/smartphones/galaxy-note20/models/Google Scholar
- [97] . 2019. Learning to regress 3D face shape and expression from an image without 3D supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7763–7772.Google ScholarCross Ref
- [98] . 2021. Multi-scale frequency bands ensemble learning for EEG-based emotion recognition. Sensors 21, 4 (2021), 1262.Google ScholarCross Ref
- [99] . 2022. FaceListener: Recognizing human facial expressions via acoustic sensing on commodity headphones. In Proceedings of the 2022 21st ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN’22).Google Scholar
- [100] . 2022. MuteIt: Jaw motion based unvoiced command recognition using earable. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 3 (2022), 1–26.Google ScholarDigital Library
- [101] . 2022. Teensy Audio Implementation Library. Retrieved August 12, 2023 from https://github.com/PaulStoffregen/AudioGoogle Scholar
- [102] . 2020. SEANet: A multi-modal speech enhancement network. arXiv preprint arXiv:2009.02095 (2020).Google Scholar
- [103] . 2014. DeepFace: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1701–1708.Google ScholarDigital Library
- [104] . 2019. Ear acoustic authentication technology: Using sound to identify the distinctive shape of the ear canal. NEC Technical Journal—Special Issue on Social Value Creation Using Biometrics 13, 2 (2019), 87–90.Google Scholar
- [105] . 2021. Simulation of eye tracking control based electric wheelchair construction by image segmentation algorithm. Journal of Innovative Image Processing 3, 01 (2021), 21–35.Google ScholarCross Ref
- [106] . 2018. Deep domain adaptation to predict freezing of gait in patients with Parkinson’s disease. In Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA’18). IEEE, Los Alamitos, CA, 1001–1006.Google Scholar
- [107] . 2017. Attention is all you need. In Advances in Neural Information Processing Systems, , , , , , , and (Eds.), Vol. 30. Curran Associates, 1–11. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdfGoogle Scholar
- [108] . 2021. ExpressEar: Sensing fine-grained facial expressions with earables. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 3 (2021), 1–28.Google ScholarDigital Library
- [109] . 2022. Teensy ADC Implementation Library. Retrieved August 12, 2023 from https://github.com/pedvide/ADCGoogle Scholar
- [110] . 2013. Dropout training as adaptive regularization. In Advances in Neural Information Processing Systems.Google Scholar
- [111] . 2014. Social-cognitive, physiological, and neural mechanisms underlying emotion regulation impairments: Understanding anxiety in autism spectrum disorder. International Journal of Developmental Neuroscience 39 (2014), 22–36.Google ScholarCross Ref
- [112] . 2021. BioFace-3D: Continuous 3D facial reconstruction through lightweight single-ear biosensors. In Proceedings of the 27th Annual International Conference on Mobile Computing and Networking. 350–363.Google ScholarDigital Library
- [113] . 2013. Supervised descent method and its applications to face alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 532–539.Google ScholarDigital Library
- [114] . 2020. Learning texture transformer network for image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5791–5800.Google ScholarCross Ref
- [115] . 2020. Ear-AR: Indoor acoustic augmented reality on earphones. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking. 1–14.Google ScholarDigital Library
- [116] . 2022. SETransformer: Speech enhancement transformer. Cognitive Computation 14, 3 (2022), 1152–1158.Google ScholarCross Ref
- [117] . 2019. Pretraining-based natural language generation for text summarization. arXiv preprint arXiv:1902.09243 (2019).Google Scholar
- [118] . 2011. Facial expression recognition using facial movement features. IEEE Transactions on Affective Computing 2, 4 (2011), 219–229.
DOI: Google ScholarDigital Library - [119] . 2022. Let’s grab a drink: Teacher-student learning for fluid intake monitoring using smart earphones. In Proceedings of the 2022 IEEE/ACM 7th International Conference on Internet-of-Things Design and Implementation (IoTDI’22). IEEE, Los Alamitos, CA, 55–66.Google Scholar
- [120] . 2023. I spy you: Eavesdropping continuous speech on smartphones via motion sensors. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 4 (2023), 1–31.Google Scholar
- [121] . 2020. Revisiting few-sample BERT fine-tuning. arXiv preprint arXiv:2006.05987 (2020).Google Scholar
- [122] . 2011. Linear versus mel frequency cepstral coefficients for speaker recognition. In Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition and Understanding. IEEE, Los Alamitos, CA, 559–564.Google ScholarCross Ref
- [123] . 2017. Fine-tuning convolutional neural networks for biomedical image analysis: Actively and incrementally. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).Google ScholarCross Ref
- [124] . 2015. Face alignment by coarse-to-fine shape searching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 4998–5006.Google Scholar
Index Terms
- I Am an Earphone and I Can Hear My User’s Face: Facial Landmark Tracking Using Smart Earphones
Recommendations
Facial Landmark Detection and Tracking for Facial Behavior Analysis
ICMR '16: Proceedings of the 2016 ACM on International Conference on Multimedia RetrievalThe face is the most dominant and distinct communication tool of human beings. Automatic analysis of facial behavior allows machines to understand and interpret a human's states and needs for natural interactions. This research focuses on developing ...
Reconstruction of occluded facial images using asymmetrical Principal Component Analysis
When only non-occluded image parts are available for facial images it is difficult or impossible to correctly recognize the person in the image. The problem addressed in this work is reconstruction of the occluded parts in facial images; e.g. eyes ...
Can a phone hear the shape of a room?
IPSN '19: Proceedings of the 18th International Conference on Information Processing in Sensor NetworksUnderstanding the location of acoustically reflective surfaces in a room is a critical component in advanced sound processing. For example, intelligent speakers can use a room's acoustic geometry to improve playback quality, source separation accuracy, ...
Comments