Abstract
Wearables that record audio continuously have various applications such as health monitoring and cognitive augmentation. However, they raise serious privacy concerns amongst bystanders and conversation partners, reducing their social acceptability. To address this, we designed the MeMic, a wearable that records audio only when the user speaks, driven by a hardware-based voice activity detector. A visible light on the wearable indicates when it is actively recording to enhance trust for others. We validate its performance with participants (N=12) wearing the MeMic and performing tasks. Further, an online study (N=168) compared the social acceptability of the MeMic’s self-recording paradigm versus continuous recording. We find significantly less social fears alongside reduced privacy concerns in the self-recording paradigm, thereby improving social acceptability. We also explore different conceptual form factors (glasses, pendant necklace, and behind-the-neck) for the MeMic and find that the pendant necklace is the most preferred. This work contributes towards enhancing the social comfort of wearables that continuously capture users’ speech.
1 INTRODUCTION
Continuous, all-day capture of speech is critical for many applications in the field of human-computer interaction (HCI). Research areas such as healthcare monitoring, mood sensing, memory and intelligence augmentation, and social behavior monitoring all benefit from access to users’ speech signals. Speech capture is often achieved by running continuous audio recording on a wearable device. However, when running continuous capture systems outside constrained settings such as research laboratories, privacy of the audio data recorded from both bystanders and the user becomes a key issue [3]. While speech privacy is paramount for bystanders, the non-speech signals from the environment of the user can also reveal sensitive activities [23]. These issues decrease the social acceptability of such systems when deployed, as was the case for Google Glass [10, 11]. This is further important as several nations and states legally require all parties in a conversation to give consent for conversation recording systems [4].
To improve the social acceptability of continuous audio recording wearables, we designed the MeMic - a self voice-activity-detector (self-VAD) wearable that records only the voice of the user. The MeMic is focused on maximizing privacy by never sensing, recording, or processing any speech signal except those that originate from the user. The MeMic features a microphone that is turned off by default. It senses when the user is speaking, and only then does it turn on the microphone and record audio, with a visible light on the wearable indicating that it is recording. We hypothesize that the MeMic will have higher social acceptability than a continuous audio recording wearable.
We conducted a lab study (N=12) to evaluate the MeMic’s performance while participants wore it in a series of tasks, and measured a detection error rate of 0.09 ± 0.03. Further, to test if the MeMic is socially acceptable, we ran an online study (N=168), which showed that the MeMic had significantly improved social acceptability as compared to a continuously recording wearable.
In summary, this paper makes the following contributions:
• | The technical design and evaluation of a self-recording audio wearable using an accelerometer. | ||||
• | An assessment showing significantly higher social acceptability of a self-recording wearable compared to a continuous recording wearable, attributed to diminished social fears and privacy concerns. | ||||
• | An investigation into user perceptions and preferences regarding four different form factors of the self-recording wearable. |
2 BACKGROUND AND RELATED WORK
2.1 Self Voice Activity Detection
Self Voice Activity Detection (self-VAD) can be defined as detecting if the user, and only the user, is speaking. One method used to exclusively recognize the user’s voice would require a combination of voice activity detection (VAD) and speaker diarization and verification. VADs have been a well-researched area [20] ever since Sohn et al [21] developed a statistical model to detect speech in 1999. Machine learning significantly enhanced VAD accuracy, employing methods such as Gaussian mixture models [28], recurrent neural networks [7], and deep belief networks [27]. However, these software-based methods typically record a segment of speech before processing it and deciding whether to save it or not. Additionally, speaker diarization and verification algorithms [8] can confirm the speaker’s identity. These approaches require that audio first be recorded before VAD can be performed, which presents privacy and legal concerns in situations where it’s not appropriate to record others.
There also exist hardware-based VAD solutions [26]. One example is a contact-based VAD, which can be realized by placing an inertial measurement unit (IMU) in contact with the user’s skin and detecting speech-related vibrations originating from the user. Although the concept of contact-based VAD has been demonstrated in prior work [1, 2, 16], its application in wearable, privacy-preserving devices has not been explored.
2.2 Social Acceptability in Wearables
With wearable devices becoming more ubiquitous and used in social contexts, the HCI community has grown increasingly interested in understanding their social acceptability [14]. This is especially relevant for wearables that continuously record audio, as they can impact the privacy of those around the wearer [3]. There are several methods to assess the social acceptability of wearables, including diary studies [15], online surveys [5, 18], and lab tests [19]. The WEAR Scale, created by Kelly et al. [9], offers a standardized way to evaluate this aspect through a survey. It has been applied to existing devices like the Apple Watch, Google Glass, and brain-sensing headsets, allowing for consistent comparisons of wearables’ social acceptability [10].
The perceived utility of a wearable can influence its social acceptability, as shown by Profita et al. [18] through an online study, where the presence of a disability changed how bystanders viewed a wearable. Additionally, Williamson et al. [25] suggest that interactions which explain the device’s function, are more socially acceptable, inspiring the LED indicator on the MeMic which indicates active recording.
3 SYSTEM DESIGN
The MeMic is designed with the core principle of privacy, which necessitates that the device remains entirely inactive by default. This means that no audio data is captured or processed unless it is specifically activated by the user speaking. The MeMic uses an IMU for detecting speech. When the IMU senses specific signals associated with speaking, it triggers the microphone to turn on. This ensures that the microphone only records when the user is actively speaking. Once the user stops speaking, the IMU no longer detects the speech-related vibration/movements, and the microphone is turned off. To maintain transparency and respect others’ privacy, a light indicator on the device turns on whenever the microphone is active, signaling that recording is in progress. The recorded data can be stored on an SD card within the device or transmitted to a smartphone via Bluetooth.
3.1 Form Factors
We explored multiple form factors for the MeMic as shown in Fig. 1. For the self-VAD accuracy study, we pursued the snug necklace form factor. Prior work [6, 12] shows the possibility to obtain the signal from chest, ears, nose, and neck locations. As form factor preferences of people can be varied, we made demonstration versions of MeMic in various other form factors which we assessed in our social acceptability study. The four form factors can be seen in Fig. 1 and included a snug necklace (a necklace-like device designed to snugly fit around the neck), a pendant necklace, glasses, and earbuds, where the connecting electronics/mechanical design circles behind the user’s head/neck. For the technical evaluation in this paper, we use the snug necklace only to ensure optimal contact with the larynx.
3.2 Mechanical Design and Electronics
The necklace was designed to be lightweight and small so as not to cause discomfort over long periods of wear [13]. The weight of the necklace MeMic is 20 grams, measuring 43x14 mm. The MeMic’s core electronics are built around the Seeeduino XIAO NRF52 Sense BLE microcontroller breakout board, powered by a 105mAh LiPo battery. Further details on the materials used and firmware can be found in Appendix A.
3.3 Voice Activity Detection Algorithm
The self voice activity detection (self-VAD) algorithm of the MeMic, running locally on the NRF52, begins with a sliding window that holds the most recent IMU data. Repeatedly at a specified interval, a bandpass filter (BPF) is applied to the window of IMU data, narrowing the frequency range to focus specifically on voice-related frequencies [17, 24] (85Hz to 400Hz). Three features are then calculated on the z-axis acceleration of the IMU: root mean square (RMS), zero crossings (ZC), and rate of change of RMS (RRMS). Higher values correlate with active speaking, and thresholds for the three metrics were chosen by hand through iterative testing. The microphone is only activated only when the combined criteria of RMS, ZC, and RRMS exceed their respective thresholds. The thresholds used were \(RMS\_threshold=0.0225\), \(ZC\_threshold=0.21\) and \(RRMS\_threshold=0.0195\). The sliding window used 40 samples for an IMU sampling rate of 833Hz and BPF applied at an interval of 50ms.
3.4 Trust Light
The "trust light" on the MeMic, prominently positioned on each form factor, acts as a real-time visual cue to bystanders indicating active recording. It illuminates when the device detects speech and starts recording, and turns off once speech ends, providing immediate, transparent feedback. This feature aims to foster trust and social acceptability by clearly signaling when recording occurs.
4 USER STUDY - SELF VOICE ACTIVITY DETECTION ACCURACY
To evaluate the performance of the MeMic’s self-VAD, we designed a study where participants would wear the MeMic and the self-VAD performance would be assessed during a variety of speaking, listening, and movement activities. These activities were designed to mimic real-life activities and interactions that a user would engage in while using a MeMic.
4.1 Study Design and Metrics
Participants, wearing the "snug necklace form factor" MeMic (chosen for optimal contact with the larynx), sat in front of a webcam-equipped laptop. Instructions were provided via a web interface as shown in Appendix B. The study involved three activities: reading text aloud, performing movements (nodding, swaying, and drinking water), and silently listening to 30 seconds of pre-recorded speech.
A standard measure of VAD performance is the detection error rate, described in equation 1. It is the sum of the false alarm rate (non-self-speech incorrectly classified as self-speech) and the missed detection rate (self-speech incorrectly classified as non-self-speech), calculated by dividing by the total duration of self-speech in the ground truth. Ground truth was established by applying a proven voice activity detection algorithm [22] to the task’s recorded webcam audio. We then compare this to the MeMic’s own voice activity detection, which is inferred by using a light detector to determine when the device’s "trust light" (which activates when the MeMic detects speech) is illuminated in the video frames. (1) \(\begin{equation} \text{Detection Error Rate} = \frac{\text{False Alarms} + \text{Missed Detections}}{\text{Total Duration}} \end{equation}\)
4.2 Participants
12 participants (age range = 23-34, agemean = 26.0, ageSD = 3.2) were recruited from the university mailing list. A balanced sample of sex was recruited (male=7, female=5) due to variations in frequencies of the voice between the sexes. The participants were fluent in English, and had normal or corrected-to-normal speech and hearing.
5 USER STUDY - SOCIAL ACCEPTABILITY
We designed a study to examine the social acceptability of the MeMic. The study design involved showing participants pre-recorded videos of a person using the MeMic in three everyday situations. Use cases were shown to address the privacy-utility tradeoff [29]. The use cases presented were (i) a memory assistant (automatic ’to-do list’ creation) (ii) a communication skills improvement assistant, and (iii) real-time mood monitoring. This video demonstrated the pendant necklace form factor. To activate and deactivate the MeMic’s light when speaking, the actor pressed a hidden button. The video (Video A) contained a narrative that explained how self-recording in the MeMic worked and what the trust light means.
For a comparison of the self-recording capability of MeMic with a continuously recording wearable, we made Video B by modifying Video A such that the MeMic’s light was constantly on. This version of the video also had narration, which described that the microphone was continuously on, and described what the trust light meant. These videos (A and B), identical in all regards except for the light on the MeMic and narration, contrasted the social acceptability of self-VAD versus continuous voice recording. Both videos are available in the supplementary material.
Further, to gauge user preferences for the MeMic’s different designs, we created videos displaying alternative form factors not shown in the main video (glasses, behind-the-neck earbuds, and a pendant necklace without an LED).
5.1 Study Design and Metrics
In a between-subjects study, participants were randomly assigned to either the MeMic condition (Video A) or the continuous recording condition (Video B). After viewing their assigned video, they assessed its social acceptability using the WEAR Scale survey. Subsequently, they viewed videos of other MeMic form factors and responded to a selected subset of four WEAR Scale survey questions. The procedure and subset are shown in Appendix D.
The primary measure was the WEAR Scale [9] which consists of 14 items evaluated using a 6-point Likert scale, ranging from "Strongly Disagree" to "Strongly Agree." This scale is split into two subscales: Fulfillment of Aspirational Desires and Absence of Social Fears. Additionally, participants provide open-ended responses about whether they would use the device, their social comfort if they used the device, and remarks on its appearance and functionality. Participants were also asked to rank their preferred form factors after viewing all the form factor videos.
5.2 Participants
168 participants were recruited through an online platform. 18 participants failed attention checks, leaving 150 participants (age range = 18 to 63, agemean = 28.4, ageSD = 8.4) who successfully completed the study. The participants were fluent in English. Additional questions, found in Appendix C, were answered by participants to serve as covariates in our analyses, encompassing their usage habits and attitudes towards wearable technology, perspectives on digital privacy, and the frequency of using accessories like jewelry, glasses, and earbuds.
6 RESULTS AND DISCUSSION
6.1 Self Voice Activity Detection Performance
The self-VAD accuracy study yielded a detection error rate of 0.09 ± 0.03, with a false alarm rate of 0.07 ± 0.04, and a missed detection rate of 0.02 ± 0.02.
6.2 Social Acceptability
The analysis for the study was preregistered and can be found at www.aspredicted.org/71MFMQ.
6.2.1 Quantitative Results.
On comparing the social acceptability of the MeMic self-recording paradigm with a continuous recording, there was a significant difference found in three questions on the WEAR Scale subscale "Absence of Social Fears" shown in Fig. 2a. No significant differences were found in the "Fulfillment of Aspirational Desires" subscale, which can be seen in Appendix D. Significance was measured using the Mann-Whitney U test as the responses failed the Shapiro-Wilk normality test. The items on the scale with significant differences were:
• | Use of this device raises privacy issues. (p=.0362) MeMic M=4.76, SD=1.20, Continuous Recording M=5.08, SD=1.34 | ||||
• | Wearing this device could be considered inappropriate. (p=.0013) MeMic M=3.79, SD=1.49, Continuous Recording M=4.39, SD=1.17 | ||||
• | People would not be offended by the wearing of this device. (p=.0009) MeMic M=3.43, SD=1.17, Continuous Recording M=2.82, SD=1.12 |
Analyzing the mean WEAR Scale scores with an Analysis of Covariance (ANCOVA) found that controlling for two covariates made the differences between the conditions statistically significant. The covariates were:
• | Perception of impact of wearable technology. (p = 0.015). A subgroup analysis revealed that there was a significant difference when the participants felt ’Positive’ (N=77) (p = 0.017) about the impact of wearable technology. | ||||
• | Importance of keeping up with technology trends. (p = 0.0147). A subgroup analysis revealed that there was a significant difference when the participants ’Strongly Agree’ (N=23) (p = 0.041) that it is important to keep up with the latest trends in technology. |
6.2.2 Qualitative Results.
The participants were asked two open-ended qualitative questions. The first asked users about their comfort using the device (the MeMic versus a continuous recording wearable, depending on their assigned condition). The second asked participants for any general feedback about the device.
To assess the qualitative data, participants’ feedback was first manually reviewed, with significant themes noted. Subsequently, GPT-4 was given all qualitative results and was utilized to derive thematic codes, defining recurrent themes across the feedback. These codes were then manually assessed and refined by the authors after manual review of the qualitative results. GPT-4 then systematically assigned these codes to each response. The mean frequency of each code per condition was calculated, followed by conducting statistical analysis to ascertain any significant differences between conditions. The prompt can be found in Appendix D.1
Comfort Question: "Would you feel comfortable wearing this device in a social setting? Why or why not?".
The direct answers to the first question were first coded into groups of "Yes", "Maybe", and "No" (by GPT-4). A Mann-Whitney U test revealed a significant difference between the two conditions regarding the participants who answered "Yes" - participants are more likely to wear the MeMic than the continuously recording wearable in a social setting (P-value = 0.0174), shown in Fig. 4. This result supports our hypothesis that the MeMic will have higher social acceptability than a continuously recording wearable.
Coding of the qualitative results revealed a single significant result of the "Privacy Invasion" code/theme, with a Mann-Whitney U test revealing a significant difference for the "Privacy Invasion" code between conditions (p=0.0007), with the MeMic having a much lower number of "Privacy Invasion" coded responses, as shown in Fig. 3. As privacy concerns are a key to social acceptability, this result supports our hypothesis.
Overall, answers to this question discuss privacy and intrusiveness concerns, particularly regarding recording without consent and its impact on social dynamics. Apprehensions about the device’s bulky and conspicuous appearance are also prevalent. Some participants recognize the potential utility of the device in their lives.
Feedback Question: "Do you have general feedback about the look, functionality, or anything else about the device?". Overall, the feedback included desires for a more discreet and smaller design, concerns regarding privacy and the ethical implications of recording conversations, and the hope for the device to appear more fashionable. Additionally, there was skepticism about the device’s effectiveness and mixed views on its societal acceptance and potential impact on human behavior. The visibility of the recording indicator light was considered necessary for trust.
6.3 Form Factor Preferences
The preferences between the four options of the MeMic revealed that the "With LED" Pendant Necklace was most preferred with a mean ranking of 2.05, where the ranking for each form factor was between 1 to 4 (1 - most preferred, 4 - least preferred). The second most preferred was the "No LED" Pendant Necklace with a mean ranking of 2.35. This indicates that the Pendant Necklace form factor is the most preferred form factor, followed by behind-the-neck, and then glasses. The preferences and rankings are shown in Fig. 5.
On comparing between "no LED" and "with LED" for the pendant necklace, we found a significant difference in the privacy issues (p=0.00019, "no LED" M=4.94, SD=1.41, "with LED" M=3.74, SD=1.42) and consistency with self-image (p=0.020, "no LED" M=2.85, SD=1.40, "with LED" M=3.35, SD=1.45) shown in Fig 6. These results reinforce our design decision to add an LED indicator to foster trust.
7 LIMITATIONS AND FUTURE WORK
The self-VAD performance was measured with participants wearing the MeMic for a short timeframe which may not accurately represent the experience of all-day use. Future studies should involve longitudinal usage of the MeMic to test its robustness in daily activities and physical comfort over extended periods. Social acceptability was measured by participants’ responses to a video demonstration of the MeMic. While an online study allows a large and diverse study population, real-world acceptability could differ. Also, understanding how acceptability also changes when both parties in an interaction have access to the device could provide valuable insights.
Furthermore, the MeMic could capture the speech of a bystander and background sounds if they occur simultaneously with the user’s speech. This is a limitation of the current design which can be addressed in future work using contact-based microphones and sensor-fusion filters. Finally, a limitation of our current design is the reliance on a tight mechanical coupling between the MeMic and a body part that vibrates significantly during speech. This requirement could potentially impact user comfort. Future developments should combine electromyography (EMG), time of flight (TOF), and other sensors with more sophisticated software designs such as machine learning to maintain the high accuracy of the self-VAD while enabling for a more comfortable form factor.
8 CONCLUSION
This work introduces the MeMic, a novel wearable device designed to enhance privacy and social acceptability by recording audio if, and only if, the user is speaking. The lab study validated the MeMic’s accuracy in self-voice activity detection. An online study confirmed that the MeMic significantly improves social acceptability compared to continuously recording wearables. Additionally, the online study revealed a preference for the pendant necklace form factor and the LED indicator, used to communicate active recording to bystanders, thereby offering insights for future wearable designs. Our work contributes to a more socially acceptable adoption of all-day audio capture wearable devices in everyday life.
ACKNOWLEDGMENTS
We would like to thank Eyal Perry, Angela Vujic, Valdemar Danry, and Tomas Vega for their valuable feedback and critique throughout the design of the wearable and the study evaluation.
A SYSTEM DESIGN
(2) \(\begin{equation} \text{self-VAD} = \left\lbrace \begin{array}{ll}& \,\,\text{if } (RMS \gt RMS\mbox{_}threshold) \text{ AND} \\ 1, & \quad \,\, ZC \gt ZC\mbox{_}threshold \text{ AND} \\ &\quad \,\, RRMS \gt RRMS\mbox{_}threshold\\ 0, & \text{otherwise}\\ \end{array}\right. \end{equation}\)A.1 Mechanical and Electronic Design
Plastic was used due to lightweight, robustness, and easy of prototyping. A matte black PLA was chosen to minimize the conspicuousness of the device. The cord of the necklace is a black, elastic string, chosen because the elastic allows the snug-fit to maintain a tight fit without causing discomfort during throat movements. A spring-powered cord lock was used to allow quick donning and doffing of the device, and to allow for various user sizes.
The Data storage is handled by a microSD SPI breakout module, directly soldered to the XIAO board’s pins, ensuring efficient data handling The device is powered by a 105mAh LiPo battery and it’s charged via USB-C, aligning with common charging standards. The XIAO NRF52 Sense BLE was chosen for its compact size and integrated Bluetooth capability.
The device’s firmware is written in C++ Arduino. It handles data collection from the IMU, running the self-VAD algorithm locally, turning on the microphone and trust light when speech is detected, and saving or streaming of audio data.
B ACCURACY STUDY
C DEMOGRAPHIC QUESTIONS FOR ONLINE STUDY
Question: Gender
Options: Female, Male, Non-Binary
Question: How frequently do you wear neck jewelry?
Options: Never, Once a month, Once a week, A few times a week, Daily
Question: How frequently do you wear glasses?
Options: Never, Used to wear in the past, Sometimes, Always
Question: How frequently do you wear earbuds/earphones?
Options: Never, Once a month, Once a week, A few times a week, Daily
Question: How often do you access social media on a mobile phone?
Options: Never, Several times a month, Several times a week, Several times a day, All the time
Question: How do you perceive the impact of wearable technology (like smartwatches, fitness trackers, etc.) on people’s lives?
Options: Very Negative, Negative, Neutral, Positive, Very Positive
Question: How often do you use wearable technology (such as smartwatches, fitness trackers, etc.)?
Options: Never, Rarely, Sometimes, Often, Always
Question: I think it is important to keep up with the latest trends in technology.
Options: Strongly Disagree, Disagree, Neutral, Agree, Strongly Agree
Question: I worry about the possibility that my conversations will be overheard.
Options: Strongly Disagree, Disagree, Neutral, Agree, Strongly Agree
Question: Employers should be able to monitor employee email.
Options: Strongly Disagree, Disagree, Neutral, Agree, Strongly Agree
Question: I am concerned about the security of my personal information when using digital services (like social media, online shopping, etc.)?
Options: Strongly Disagree, Disagree, Neutral, Agree, Strongly Agree
D ONLINE SOCIAL ACCEPTABILITY STUDY
The subset of WEAR scale questions were:
(1) | This device is consistent with my self image. | ||||
(2) | Use of this device raises privacy issues. | ||||
(3) | This device would enhance the wearer’s image | ||||
(4) | The wearer of this device would get a positive reaction from others |
D.1 Qualitative Coding Prompt
The following is the prompt used with GPT4 to label the qualitative responses with codes:
Footnotes
⁎ Both authors contributed equally to this research.
Supplemental Material
Available for Download
- Saurav Dubey, Arash Mahnan, and Jürgen Konczak. 2020. Real-time voice activity detection using neck-mounted accelerometers for controlling a wearable vibration device to treat speech impairment. In Frontiers in Biomedical Devices, Vol. 83549. American Society of Mechanical Engineers, V001T09A007.Google Scholar
- Saurav Kumar Dubey. 2019. Accelerometer-based real-time voice activity detection using neck surface vibration measurement. Ph. D. Dissertation. University of Minnesota.Google Scholar
- Julia C. Dunbar, Emily Bascom, Ashley Boone, and Alexis Hiniker. 2021. Is Someone Listening? Audio-Related Privacy Perceptions and Design Recommendations from Guardians, Pragmatists, and Cynics. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 3, Article 98 (sep 2021), 23 pages. https://doi.org/10.1145/3478091Google ScholarDigital Library
- Yaagneshwaran Ganesh. 2023. Call recording laws: one party (two party) consent states - A look at the laws in detail. Avoma Blog (2023). https://www.avoma.com/blog/call-recording-lawsGoogle Scholar
- Jun Gong, Lan Li, Daniel Vogel, and Xing-Dong Yang. 2017. Cito: An actuated smartwatch for extended interactions. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 5331–5345.Google ScholarDigital Library
- Lixing He, Haozheng Hou, Shuyao Shi, Xian Shuai, and Zhenyu Yan. 2023. Towards Bone-Conducted Vibration Speech Enhancement on Head-Mounted Wearables. In Proceedings of the 21st Annual International Conference on Mobile Systems, Applications and Services. 14–27.Google ScholarDigital Library
- Thad Hughes and Keir Mierle. 2013. Recurrent neural networks for voice activity detection. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. 7378–7382. https://doi.org/10.1109/ICASSP.2013.6639096Google ScholarCross Ref
- Rashid Jahangir, Ying Wah Teh, Henry Friday Nweke, Ghulam Mujtaba, Mohammed Ali Al-Garadi, and Ihsan Ali. 2021. Speaker identification through artificial intelligence techniques: A comprehensive review and research challenges. Expert Systems with Applications 171 (2021), 114591.Google ScholarDigital Library
- Norene Kelly and Stephen Gilbert. 2016. The WEAR Scale: Developing a Measure of the Social Acceptability of a Wearable Device. In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems (San Jose, California, USA) (CHI EA ’16). Association for Computing Machinery, New York, NY, USA, 2864–2871. https://doi.org/10.1145/2851581.2892331Google ScholarDigital Library
- Norene Kelly and Stephen Gilbert. 2018. The Wearer, the Device, and Its Use: Advances in Understanding the Social Acceptability of Wearables. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 62 (09 2018), 1027–1031. https://doi.org/10.1177/1541931218621237Google ScholarCross Ref
- Sheilagh Kernaghan. 2016. Google glass: An evaluation of social acceptance. Unpublished doctoral dissertation (2016).Google Scholar
- Tatsuya Kitamura and Keisuke Ohtani. 2015. Non-contact measurement of facial surface vibration patterns during singing by scanning laser Doppler vibrometer. Frontiers in Psychology 6 (2015), 1682.Google ScholarCross Ref
- James F Knight and Chris Baber. 2005. A tool to assess the comfort of wearable computers. Human factors 47, 1 (2005), 77–91.Google Scholar
- Marion Koelle, Swamy Ananthanarayan, and Susanne Boll. 2020. Social acceptability in HCI: A survey of methods, measures, and design strategies. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–19.Google ScholarDigital Library
- Marion Koelle, Torben Wallbaum, Wilko Heuten, and Susanne Boll. 2019. Evaluating a Wearable Camera’s Social Acceptability In-the-Wild. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI EA ’19). Association for Computing Machinery, New York, NY, USA, 1–6. https://doi.org/10.1145/3290607.3312837Google ScholarDigital Library
- Daryush D. Mehta, Matías Zañartu, Shengran W. Feng, Harold A. Cheyne II, and Robert E. Hillman. 2012. Mobile Voice Health Monitoring Using a Wearable Accelerometer Sensor and a Smartphone Platform. IEEE Transactions on Biomedical Engineering 59, 11 (2012), 3090–3096. https://doi.org/10.1109/TBME.2012.2207896Google ScholarCross Ref
- Thomas Murry, William S Brown Jr, and Richard J Morris. 1995. Patterns of fundamental frequency for three types of voice samples. Journal of Voice 9, 3 (1995), 282–289.Google ScholarCross Ref
- Halley Profita, Reem Albaghli, Leah Findlater, Paul Jaeger, and Shaun K Kane. 2016. The AT effect: how disability affects the perceived social acceptability of head-mounted display use. In proceedings of the 2016 CHI conference on human factors in computing systems. 4884–4895.Google ScholarDigital Library
- Marcos Serrano, Barrett M Ens, and Pourang P Irani. 2014. Exploring the use of hand-to-face input for interacting with head-worn displays. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 3181–3190.Google ScholarDigital Library
- Jong Won Shin, Joon-Hyuk Chang, and Nam Soo Kim. 2010. Voice activity detection based on statistical models and machine learning approaches. Computer Speech, Language 24, 3 (2010), 515–530. https://doi.org/10.1016/j.csl.2009.02.003 Emergent Artificial Intelligence Approaches for Pattern Recognition in Speech and Language Processing.Google ScholarDigital Library
- Jongseo Sohn, Nam Soo Kim, and Wonyong Sung. 1999. A statistical model-based voice activity detection. IEEE Signal Processing Letters 6, 1 (1999), 1–3. https://doi.org/10.1109/97.736233Google ScholarCross Ref
- Silero Team. 2021. Silero VAD: pre-trained enterprise-grade Voice Activity Detector (VAD), Number Detector and Language Classifier. https://github.com/snakers4/silero-vad.Google Scholar
- Ohini Kafui Toffa and Max Mignotte. 2021. Environmental Sound Classification Using Local Binary Pattern and Audio Features Collaboration. IEEE Transactions on Multimedia 23 (2021), 3978–3985. https://doi.org/10.1109/TMM.2020.3035275Google ScholarCross Ref
- Hartmut Traunmüller and Anders Eriksson. 1995. The frequency range of the voice fundamental in the speech of male and female adults. Unpublished manuscript 11 (1995).Google Scholar
- Julie R Williamson, Andrew Crossan, and Stephen Brewster. 2011. Multimodal mobile interactions: usability studies in real world settings. In Proceedings of the 13th international conference on multimodal interfaces. 361–368.Google ScholarDigital Library
- Shubham Yadav, Patrice Abbie D. Legaspi, Mark S. Oude Alink, André B. J. Kokkeler, and Bram Nauta. 2023. Hardware Implementations for Voice Activity Detection: Trends, Challenges and Outlook. IEEE Transactions on Circuits and Systems I: Regular Papers 70, 3 (2023), 1083–1096. https://doi.org/10.1109/TCSI.2022.3225717Google ScholarCross Ref
- Dongwen Ying, Yonghong Yan, Jianwu Dang, and Frank K. Soong. 2011. Voice Activity Detection Based on an Unsupervised Learning Framework. IEEE Transactions on Audio, Speech, and Language Processing 19, 8 (2011), 2624–2633. https://doi.org/10.1109/TASL.2011.2125953Google ScholarDigital Library
- Xiao-Lei Zhang and Ji Wu. 2013. Deep Belief Networks Based Voice Activity Detection. IEEE Transactions on Audio, Speech, and Language Processing 21, 4 (2013), 697–710. https://doi.org/10.1109/TASL.2012.2229986Google ScholarDigital Library
- Hao Zhong and Kaifeng Bu. 2022. Privacy-Utility Trade-Off. arXiv preprint arXiv:2204.12057 (2022).Google Scholar
Index Terms
- MeMic: Towards Social Acceptability of User-Only Speech Recording Wearables
Recommendations
Infusing meaning into social wearables: lessons from sentimental jewelry
HCI '18: Proceedings of the 32nd International BCS Human Computer Interaction ConferenceWearable devices play an increasingly important role in maintaining and improving the sense of connectedness between loved-ones. However, despite affording numerous novel interactions and possibilities, wearables are easily discarded. And their ...
Evaluating a Wearable Camera's Social Acceptability In-the-Wild
CHI EA '19: Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing SystemsWith increasing ubiquity, wearable technologies are becoming part of everyday life where they may cause controversy, discomfort and social tension. Particularly, body-worn "always-on" cameras raise social acceptability concerns as their form factors ...
Design Framework for Social Wearables
DIS '19: Proceedings of the 2019 on Designing Interactive Systems ConferenceWearables are integrated into many aspects of our lives, yet, we still need further guidance to develop devices that truly enhance in-person interactions, rather than detract from them by taking people's attention away from the moment and one another. ...
Comments