skip to main content
10.1145/3613905.3650872acmconferencesArticle/Chapter ViewFull TextPublication PageschiConference Proceedingsconference-collections
Work in Progress
Free Access

MeMic: Towards Social Acceptability of User-Only Speech Recording Wearables

Published:11 May 2024Publication History

Abstract

Wearables that record audio continuously have various applications such as health monitoring and cognitive augmentation. However, they raise serious privacy concerns amongst bystanders and conversation partners, reducing their social acceptability. To address this, we designed the MeMic, a wearable that records audio only when the user speaks, driven by a hardware-based voice activity detector. A visible light on the wearable indicates when it is actively recording to enhance trust for others. We validate its performance with participants (N=12) wearing the MeMic and performing tasks. Further, an online study (N=168) compared the social acceptability of the MeMic’s self-recording paradigm versus continuous recording. We find significantly less social fears alongside reduced privacy concerns in the self-recording paradigm, thereby improving social acceptability. We also explore different conceptual form factors (glasses, pendant necklace, and behind-the-neck) for the MeMic and find that the pendant necklace is the most preferred. This work contributes towards enhancing the social comfort of wearables that continuously capture users’ speech.

Figure 1:

Figure 1: Form factors of the MeMic, from left to right: Pendant necklace, Glasses, Earbuds/behind-the-neck, Snug-fit necklace

Skip 1INTRODUCTION Section

1 INTRODUCTION

Continuous, all-day capture of speech is critical for many applications in the field of human-computer interaction (HCI). Research areas such as healthcare monitoring, mood sensing, memory and intelligence augmentation, and social behavior monitoring all benefit from access to users’ speech signals. Speech capture is often achieved by running continuous audio recording on a wearable device. However, when running continuous capture systems outside constrained settings such as research laboratories, privacy of the audio data recorded from both bystanders and the user becomes a key issue [3]. While speech privacy is paramount for bystanders, the non-speech signals from the environment of the user can also reveal sensitive activities [23]. These issues decrease the social acceptability of such systems when deployed, as was the case for Google Glass [10, 11]. This is further important as several nations and states legally require all parties in a conversation to give consent for conversation recording systems [4].

To improve the social acceptability of continuous audio recording wearables, we designed the MeMic - a self voice-activity-detector (self-VAD) wearable that records only the voice of the user. The MeMic is focused on maximizing privacy by never sensing, recording, or processing any speech signal except those that originate from the user. The MeMic features a microphone that is turned off by default. It senses when the user is speaking, and only then does it turn on the microphone and record audio, with a visible light on the wearable indicating that it is recording. We hypothesize that the MeMic will have higher social acceptability than a continuous audio recording wearable.

We conducted a lab study (N=12) to evaluate the MeMic’s performance while participants wore it in a series of tasks, and measured a detection error rate of 0.09 ± 0.03. Further, to test if the MeMic is socially acceptable, we ran an online study (N=168), which showed that the MeMic had significantly improved social acceptability as compared to a continuously recording wearable.

In summary, this paper makes the following contributions:

The technical design and evaluation of a self-recording audio wearable using an accelerometer.

An assessment showing significantly higher social acceptability of a self-recording wearable compared to a continuous recording wearable, attributed to diminished social fears and privacy concerns.

An investigation into user perceptions and preferences regarding four different form factors of the self-recording wearable.

Skip 2BACKGROUND AND RELATED WORK Section

2 BACKGROUND AND RELATED WORK

2.1 Self Voice Activity Detection

Self Voice Activity Detection (self-VAD) can be defined as detecting if the user, and only the user, is speaking. One method used to exclusively recognize the user’s voice would require a combination of voice activity detection (VAD) and speaker diarization and verification. VADs have been a well-researched area [20] ever since Sohn et al [21] developed a statistical model to detect speech in 1999. Machine learning significantly enhanced VAD accuracy, employing methods such as Gaussian mixture models [28], recurrent neural networks [7], and deep belief networks [27]. However, these software-based methods typically record a segment of speech before processing it and deciding whether to save it or not. Additionally, speaker diarization and verification algorithms [8] can confirm the speaker’s identity. These approaches require that audio first be recorded before VAD can be performed, which presents privacy and legal concerns in situations where it’s not appropriate to record others.

There also exist hardware-based VAD solutions [26]. One example is a contact-based VAD, which can be realized by placing an inertial measurement unit (IMU) in contact with the user’s skin and detecting speech-related vibrations originating from the user. Although the concept of contact-based VAD has been demonstrated in prior work [1, 2, 16], its application in wearable, privacy-preserving devices has not been explored.

2.2 Social Acceptability in Wearables

With wearable devices becoming more ubiquitous and used in social contexts, the HCI community has grown increasingly interested in understanding their social acceptability [14]. This is especially relevant for wearables that continuously record audio, as they can impact the privacy of those around the wearer [3]. There are several methods to assess the social acceptability of wearables, including diary studies [15], online surveys [5, 18], and lab tests [19]. The WEAR Scale, created by Kelly et al. [9], offers a standardized way to evaluate this aspect through a survey. It has been applied to existing devices like the Apple Watch, Google Glass, and brain-sensing headsets, allowing for consistent comparisons of wearables’ social acceptability [10].

The perceived utility of a wearable can influence its social acceptability, as shown by Profita et al. [18] through an online study, where the presence of a disability changed how bystanders viewed a wearable. Additionally, Williamson et al. [25] suggest that interactions which explain the device’s function, are more socially acceptable, inspiring the LED indicator on the MeMic which indicates active recording.

Skip 3SYSTEM DESIGN Section

3 SYSTEM DESIGN

The MeMic is designed with the core principle of privacy, which necessitates that the device remains entirely inactive by default. This means that no audio data is captured or processed unless it is specifically activated by the user speaking. The MeMic uses an IMU for detecting speech. When the IMU senses specific signals associated with speaking, it triggers the microphone to turn on. This ensures that the microphone only records when the user is actively speaking. Once the user stops speaking, the IMU no longer detects the speech-related vibration/movements, and the microphone is turned off. To maintain transparency and respect others’ privacy, a light indicator on the device turns on whenever the microphone is active, signaling that recording is in progress. The recorded data can be stored on an SD card within the device or transmitted to a smartphone via Bluetooth.

3.1 Form Factors

We explored multiple form factors for the MeMic as shown in Fig. 1. For the self-VAD accuracy study, we pursued the snug necklace form factor. Prior work [6, 12] shows the possibility to obtain the signal from chest, ears, nose, and neck locations. As form factor preferences of people can be varied, we made demonstration versions of MeMic in various other form factors which we assessed in our social acceptability study. The four form factors can be seen in Fig. 1 and included a snug necklace (a necklace-like device designed to snugly fit around the neck), a pendant necklace, glasses, and earbuds, where the connecting electronics/mechanical design circles behind the user’s head/neck. For the technical evaluation in this paper, we use the snug necklace only to ensure optimal contact with the larynx.

3.2 Mechanical Design and Electronics

The necklace was designed to be lightweight and small so as not to cause discomfort over long periods of wear [13]. The weight of the necklace MeMic is  20 grams, measuring 43x14 mm. The MeMic’s core electronics are built around the Seeeduino XIAO NRF52 Sense BLE microcontroller breakout board, powered by a 105mAh LiPo battery. Further details on the materials used and firmware can be found in Appendix A.

3.3 Voice Activity Detection Algorithm

The self voice activity detection (self-VAD) algorithm of the MeMic, running locally on the NRF52, begins with a sliding window that holds the most recent IMU data. Repeatedly at a specified interval, a bandpass filter (BPF) is applied to the window of IMU data, narrowing the frequency range to focus specifically on voice-related frequencies [17, 24] (85Hz to 400Hz). Three features are then calculated on the z-axis acceleration of the IMU: root mean square (RMS), zero crossings (ZC), and rate of change of RMS (RRMS). Higher values correlate with active speaking, and thresholds for the three metrics were chosen by hand through iterative testing. The microphone is only activated only when the combined criteria of RMS, ZC, and RRMS exceed their respective thresholds. The thresholds used were \(RMS\_threshold=0.0225\), \(ZC\_threshold=0.21\) and \(RRMS\_threshold=0.0195\). The sliding window used 40 samples for an IMU sampling rate of 833Hz and BPF applied at an interval of 50ms.

3.4 Trust Light

The "trust light" on the MeMic, prominently positioned on each form factor, acts as a real-time visual cue to bystanders indicating active recording. It illuminates when the device detects speech and starts recording, and turns off once speech ends, providing immediate, transparent feedback. This feature aims to foster trust and social acceptability by clearly signaling when recording occurs.

Skip 4USER STUDY - SELF VOICE ACTIVITY DETECTION ACCURACY Section

4 USER STUDY - SELF VOICE ACTIVITY DETECTION ACCURACY

To evaluate the performance of the MeMic’s self-VAD, we designed a study where participants would wear the MeMic and the self-VAD performance would be assessed during a variety of speaking, listening, and movement activities. These activities were designed to mimic real-life activities and interactions that a user would engage in while using a MeMic.

4.1 Study Design and Metrics

Participants, wearing the "snug necklace form factor" MeMic (chosen for optimal contact with the larynx), sat in front of a webcam-equipped laptop. Instructions were provided via a web interface as shown in Appendix B. The study involved three activities: reading text aloud, performing movements (nodding, swaying, and drinking water), and silently listening to 30 seconds of pre-recorded speech.

A standard measure of VAD performance is the detection error rate, described in equation 1. It is the sum of the false alarm rate (non-self-speech incorrectly classified as self-speech) and the missed detection rate (self-speech incorrectly classified as non-self-speech), calculated by dividing by the total duration of self-speech in the ground truth. Ground truth was established by applying a proven voice activity detection algorithm [22] to the task’s recorded webcam audio. We then compare this to the MeMic’s own voice activity detection, which is inferred by using a light detector to determine when the device’s "trust light" (which activates when the MeMic detects speech) is illuminated in the video frames. (1) \(\begin{equation} \text{Detection Error Rate} = \frac{\text{False Alarms} + \text{Missed Detections}}{\text{Total Duration}} \end{equation}\)

4.2 Participants

12 participants (age range = 23-34, agemean = 26.0, ageSD = 3.2) were recruited from the university mailing list. A balanced sample of sex was recruited (male=7, female=5) due to variations in frequencies of the voice between the sexes. The participants were fluent in English, and had normal or corrected-to-normal speech and hearing.

Skip 5USER STUDY - SOCIAL ACCEPTABILITY Section

5 USER STUDY - SOCIAL ACCEPTABILITY

We designed a study to examine the social acceptability of the MeMic. The study design involved showing participants pre-recorded videos of a person using the MeMic in three everyday situations. Use cases were shown to address the privacy-utility tradeoff [29]. The use cases presented were (i) a memory assistant (automatic ’to-do list’ creation) (ii) a communication skills improvement assistant, and (iii) real-time mood monitoring. This video demonstrated the pendant necklace form factor. To activate and deactivate the MeMic’s light when speaking, the actor pressed a hidden button. The video (Video A) contained a narrative that explained how self-recording in the MeMic worked and what the trust light means.

For a comparison of the self-recording capability of MeMic with a continuously recording wearable, we made Video B by modifying Video A such that the MeMic’s light was constantly on. This version of the video also had narration, which described that the microphone was continuously on, and described what the trust light meant. These videos (A and B), identical in all regards except for the light on the MeMic and narration, contrasted the social acceptability of self-VAD versus continuous voice recording. Both videos are available in the supplementary material.

Further, to gauge user preferences for the MeMic’s different designs, we created videos displaying alternative form factors not shown in the main video (glasses, behind-the-neck earbuds, and a pendant necklace without an LED).

5.1 Study Design and Metrics

In a between-subjects study, participants were randomly assigned to either the MeMic condition (Video A) or the continuous recording condition (Video B). After viewing their assigned video, they assessed its social acceptability using the WEAR Scale survey. Subsequently, they viewed videos of other MeMic form factors and responded to a selected subset of four WEAR Scale survey questions. The procedure and subset are shown in Appendix D.

The primary measure was the WEAR Scale [9] which consists of 14 items evaluated using a 6-point Likert scale, ranging from "Strongly Disagree" to "Strongly Agree." This scale is split into two subscales: Fulfillment of Aspirational Desires and Absence of Social Fears. Additionally, participants provide open-ended responses about whether they would use the device, their social comfort if they used the device, and remarks on its appearance and functionality. Participants were also asked to rank their preferred form factors after viewing all the form factor videos.

5.2 Participants

168 participants were recruited through an online platform. 18 participants failed attention checks, leaving 150 participants (age range = 18 to 63, agemean = 28.4, ageSD = 8.4) who successfully completed the study. The participants were fluent in English. Additional questions, found in Appendix C, were answered by participants to serve as covariates in our analyses, encompassing their usage habits and attitudes towards wearable technology, perspectives on digital privacy, and the frequency of using accessories like jewelry, glasses, and earbuds.

Skip 6RESULTS AND DISCUSSION Section

6 RESULTS AND DISCUSSION

Figure 2:

Figure 2: Differences in WEAR scale responses between the two conditions

6.1 Self Voice Activity Detection Performance

The self-VAD accuracy study yielded a detection error rate of 0.09 ± 0.03, with a false alarm rate of 0.07 ± 0.04, and a missed detection rate of 0.02 ± 0.02.

6.2 Social Acceptability

The analysis for the study was preregistered and can be found at www.aspredicted.org/71MFMQ.

6.2.1 Quantitative Results.

On comparing the social acceptability of the MeMic self-recording paradigm with a continuous recording, there was a significant difference found in three questions on the WEAR Scale subscale "Absence of Social Fears" shown in Fig. 2a. No significant differences were found in the "Fulfillment of Aspirational Desires" subscale, which can be seen in Appendix D. Significance was measured using the Mann-Whitney U test as the responses failed the Shapiro-Wilk normality test. The items on the scale with significant differences were:

Use of this device raises privacy issues. (p=.0362) MeMic M=4.76, SD=1.20, Continuous Recording M=5.08, SD=1.34

Wearing this device could be considered inappropriate. (p=.0013) MeMic M=3.79, SD=1.49, Continuous Recording M=4.39, SD=1.17

People would not be offended by the wearing of this device. (p=.0009) MeMic M=3.43, SD=1.17, Continuous Recording M=2.82, SD=1.12

Analyzing the mean WEAR Scale scores with an Analysis of Covariance (ANCOVA) found that controlling for two covariates made the differences between the conditions statistically significant. The covariates were:

Perception of impact of wearable technology. (p = 0.015). A subgroup analysis revealed that there was a significant difference when the participants felt ’Positive’ (N=77) (p = 0.017) about the impact of wearable technology.

Importance of keeping up with technology trends. (p = 0.0147). A subgroup analysis revealed that there was a significant difference when the participants ’Strongly Agree’ (N=23) (p = 0.041) that it is important to keep up with the latest trends in technology.

Figure 3:

Figure 3: Qualitative feedback of MeMic

6.2.2 Qualitative Results.

The participants were asked two open-ended qualitative questions. The first asked users about their comfort using the device (the MeMic versus a continuous recording wearable, depending on their assigned condition). The second asked participants for any general feedback about the device.

To assess the qualitative data, participants’ feedback was first manually reviewed, with significant themes noted. Subsequently, GPT-4 was given all qualitative results and was utilized to derive thematic codes, defining recurrent themes across the feedback. These codes were then manually assessed and refined by the authors after manual review of the qualitative results. GPT-4 then systematically assigned these codes to each response. The mean frequency of each code per condition was calculated, followed by conducting statistical analysis to ascertain any significant differences between conditions. The prompt can be found in Appendix D.1

Comfort Question: "Would you feel comfortable wearing this device in a social setting? Why or why not?".

The direct answers to the first question were first coded into groups of "Yes", "Maybe", and "No" (by GPT-4). A Mann-Whitney U test revealed a significant difference between the two conditions regarding the participants who answered "Yes" - participants are more likely to wear the MeMic than the continuously recording wearable in a social setting (P-value = 0.0174), shown in Fig. 4. This result supports our hypothesis that the MeMic will have higher social acceptability than a continuously recording wearable.

Coding of the qualitative results revealed a single significant result of the "Privacy Invasion" code/theme, with a Mann-Whitney U test revealing a significant difference for the "Privacy Invasion" code between conditions (p=0.0007), with the MeMic having a much lower number of "Privacy Invasion" coded responses, as shown in Fig. 3. As privacy concerns are a key to social acceptability, this result supports our hypothesis.

Overall, answers to this question discuss privacy and intrusiveness concerns, particularly regarding recording without consent and its impact on social dynamics. Apprehensions about the device’s bulky and conspicuous appearance are also prevalent. Some participants recognize the potential utility of the device in their lives.

Figure 4:

Figure 4: Would you wear the device in a social setting?

Feedback Question: "Do you have general feedback about the look, functionality, or anything else about the device?". Overall, the feedback included desires for a more discreet and smaller design, concerns regarding privacy and the ethical implications of recording conversations, and the hope for the device to appear more fashionable. Additionally, there was skepticism about the device’s effectiveness and mixed views on its societal acceptance and potential impact on human behavior. The visibility of the recording indicator light was considered necessary for trust.

Figure 5:

Figure 5: Average Preference Ranking of Form Factors of the MeMic

6.3 Form Factor Preferences

The preferences between the four options of the MeMic revealed that the "With LED" Pendant Necklace was most preferred with a mean ranking of 2.05, where the ranking for each form factor was between 1 to 4 (1 - most preferred, 4 - least preferred). The second most preferred was the "No LED" Pendant Necklace with a mean ranking of 2.35. This indicates that the Pendant Necklace form factor is the most preferred form factor, followed by behind-the-neck, and then glasses. The preferences and rankings are shown in Fig. 5.

Figure 6:

Figure 6: Measuring the effect of the LED

On comparing between "no LED" and "with LED" for the pendant necklace, we found a significant difference in the privacy issues (p=0.00019, "no LED" M=4.94, SD=1.41, "with LED" M=3.74, SD=1.42) and consistency with self-image (p=0.020, "no LED" M=2.85, SD=1.40, "with LED" M=3.35, SD=1.45) shown in Fig 6. These results reinforce our design decision to add an LED indicator to foster trust.

Skip 7LIMITATIONS AND FUTURE WORK Section

7 LIMITATIONS AND FUTURE WORK

The self-VAD performance was measured with participants wearing the MeMic for a short timeframe which may not accurately represent the experience of all-day use. Future studies should involve longitudinal usage of the MeMic to test its robustness in daily activities and physical comfort over extended periods. Social acceptability was measured by participants’ responses to a video demonstration of the MeMic. While an online study allows a large and diverse study population, real-world acceptability could differ. Also, understanding how acceptability also changes when both parties in an interaction have access to the device could provide valuable insights.

Furthermore, the MeMic could capture the speech of a bystander and background sounds if they occur simultaneously with the user’s speech. This is a limitation of the current design which can be addressed in future work using contact-based microphones and sensor-fusion filters. Finally, a limitation of our current design is the reliance on a tight mechanical coupling between the MeMic and a body part that vibrates significantly during speech. This requirement could potentially impact user comfort. Future developments should combine electromyography (EMG), time of flight (TOF), and other sensors with more sophisticated software designs such as machine learning to maintain the high accuracy of the self-VAD while enabling for a more comfortable form factor.

Skip 8CONCLUSION Section

8 CONCLUSION

This work introduces the MeMic, a novel wearable device designed to enhance privacy and social acceptability by recording audio if, and only if, the user is speaking. The lab study validated the MeMic’s accuracy in self-voice activity detection. An online study confirmed that the MeMic significantly improves social acceptability compared to continuously recording wearables. Additionally, the online study revealed a preference for the pendant necklace form factor and the LED indicator, used to communicate active recording to bystanders, thereby offering insights for future wearable designs. Our work contributes to a more socially acceptable adoption of all-day audio capture wearable devices in everyday life.

Skip ACKNOWLEDGMENTS Section

ACKNOWLEDGMENTS

We would like to thank Eyal Perry, Angela Vujic, Valdemar Danry, and Tomas Vega for their valuable feedback and critique throughout the design of the wearable and the study evaluation.

A SYSTEM DESIGN

(2) \(\begin{equation} \text{self-VAD} = \left\lbrace \begin{array}{ll}& \,\,\text{if } (RMS \gt RMS\mbox{_}threshold) \text{ AND} \\ 1, & \quad \,\, ZC \gt ZC\mbox{_}threshold \text{ AND} \\ &\quad \,\, RRMS \gt RRMS\mbox{_}threshold\\ 0, & \text{otherwise}\\ \end{array}\right. \end{equation}\)

A.1 Mechanical and Electronic Design

Plastic was used due to lightweight, robustness, and easy of prototyping. A matte black PLA was chosen to minimize the conspicuousness of the device. The cord of the necklace is a black, elastic string, chosen because the elastic allows the snug-fit to maintain a tight fit without causing discomfort during throat movements. A spring-powered cord lock was used to allow quick donning and doffing of the device, and to allow for various user sizes.

The Data storage is handled by a microSD SPI breakout module, directly soldered to the XIAO board’s pins, ensuring efficient data handling The device is powered by a 105mAh LiPo battery and it’s charged via USB-C, aligning with common charging standards. The XIAO NRF52 Sense BLE was chosen for its compact size and integrated Bluetooth capability.

The device’s firmware is written in C++ Arduino. It handles data collection from the IMU, running the self-VAD algorithm locally, turning on the microphone and trust light when speech is detected, and saving or streaming of audio data.

B ACCURACY STUDY

Figure 7:

Figure 7: A participant engaging in the accuracy study

Figure 8:

Figure 8: The procedure for the lab accuracy study of MeMic

C DEMOGRAPHIC QUESTIONS FOR ONLINE STUDY

Question: Gender

Options: Female, Male, Non-Binary

Question: How frequently do you wear neck jewelry?

Options: Never, Once a month, Once a week, A few times a week, Daily

Question: How frequently do you wear glasses?

Options: Never, Used to wear in the past, Sometimes, Always

Question: How frequently do you wear earbuds/earphones?

Options: Never, Once a month, Once a week, A few times a week, Daily

Question: How often do you access social media on a mobile phone?

Options: Never, Several times a month, Several times a week, Several times a day, All the time

Question: How do you perceive the impact of wearable technology (like smartwatches, fitness trackers, etc.) on people’s lives?

Options: Very Negative, Negative, Neutral, Positive, Very Positive

Question: How often do you use wearable technology (such as smartwatches, fitness trackers, etc.)?

Options: Never, Rarely, Sometimes, Often, Always

Question: I think it is important to keep up with the latest trends in technology.

Options: Strongly Disagree, Disagree, Neutral, Agree, Strongly Agree

Question: I worry about the possibility that my conversations will be overheard.

Options: Strongly Disagree, Disagree, Neutral, Agree, Strongly Agree

Question: Employers should be able to monitor employee email.

Options: Strongly Disagree, Disagree, Neutral, Agree, Strongly Agree

Question: I am concerned about the security of my personal information when using digital services (like social media, online shopping, etc.)?

Options: Strongly Disagree, Disagree, Neutral, Agree, Strongly Agree

D ONLINE SOCIAL ACCEPTABILITY STUDY

Figure 9:

Figure 9: The procedure for the online social acceptability study of MeMic

Figure 10:

Figure 10: Differences in WEAR Scale Fulfillment of Aspirational Desires subscale between the two conditions

The subset of WEAR scale questions were:

(1)

This device is consistent with my self image.

(2)

Use of this device raises privacy issues.

(3)

This device would enhance the wearer’s image

(4)

The wearer of this device would get a positive reaction from others

D.1 Qualitative Coding Prompt

The following is the prompt used with GPT4 to label the qualitative responses with codes:

Footnotes

  1. Both authors contributed equally to this research.

Skip Supplemental Material Section

Supplemental Material

3613905.3650872-talk-video.mp4

Talk Video

mp4

51.3 MB

References

  1. Saurav Dubey, Arash Mahnan, and Jürgen Konczak. 2020. Real-time voice activity detection using neck-mounted accelerometers for controlling a wearable vibration device to treat speech impairment. In Frontiers in Biomedical Devices, Vol. 83549. American Society of Mechanical Engineers, V001T09A007.Google ScholarGoogle Scholar
  2. Saurav Kumar Dubey. 2019. Accelerometer-based real-time voice activity detection using neck surface vibration measurement. Ph. D. Dissertation. University of Minnesota.Google ScholarGoogle Scholar
  3. Julia C. Dunbar, Emily Bascom, Ashley Boone, and Alexis Hiniker. 2021. Is Someone Listening? Audio-Related Privacy Perceptions and Design Recommendations from Guardians, Pragmatists, and Cynics. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 3, Article 98 (sep 2021), 23 pages. https://doi.org/10.1145/3478091Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Yaagneshwaran Ganesh. 2023. Call recording laws: one party (two party) consent states - A look at the laws in detail. Avoma Blog (2023). https://www.avoma.com/blog/call-recording-lawsGoogle ScholarGoogle Scholar
  5. Jun Gong, Lan Li, Daniel Vogel, and Xing-Dong Yang. 2017. Cito: An actuated smartwatch for extended interactions. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 5331–5345.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Lixing He, Haozheng Hou, Shuyao Shi, Xian Shuai, and Zhenyu Yan. 2023. Towards Bone-Conducted Vibration Speech Enhancement on Head-Mounted Wearables. In Proceedings of the 21st Annual International Conference on Mobile Systems, Applications and Services. 14–27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Thad Hughes and Keir Mierle. 2013. Recurrent neural networks for voice activity detection. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. 7378–7382. https://doi.org/10.1109/ICASSP.2013.6639096Google ScholarGoogle ScholarCross RefCross Ref
  8. Rashid Jahangir, Ying Wah Teh, Henry Friday Nweke, Ghulam Mujtaba, Mohammed Ali Al-Garadi, and Ihsan Ali. 2021. Speaker identification through artificial intelligence techniques: A comprehensive review and research challenges. Expert Systems with Applications 171 (2021), 114591.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Norene Kelly and Stephen Gilbert. 2016. The WEAR Scale: Developing a Measure of the Social Acceptability of a Wearable Device. In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems (San Jose, California, USA) (CHI EA ’16). Association for Computing Machinery, New York, NY, USA, 2864–2871. https://doi.org/10.1145/2851581.2892331Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Norene Kelly and Stephen Gilbert. 2018. The Wearer, the Device, and Its Use: Advances in Understanding the Social Acceptability of Wearables. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 62 (09 2018), 1027–1031. https://doi.org/10.1177/1541931218621237Google ScholarGoogle ScholarCross RefCross Ref
  11. Sheilagh Kernaghan. 2016. Google glass: An evaluation of social acceptance. Unpublished doctoral dissertation (2016).Google ScholarGoogle Scholar
  12. Tatsuya Kitamura and Keisuke Ohtani. 2015. Non-contact measurement of facial surface vibration patterns during singing by scanning laser Doppler vibrometer. Frontiers in Psychology 6 (2015), 1682.Google ScholarGoogle ScholarCross RefCross Ref
  13. James F Knight and Chris Baber. 2005. A tool to assess the comfort of wearable computers. Human factors 47, 1 (2005), 77–91.Google ScholarGoogle Scholar
  14. Marion Koelle, Swamy Ananthanarayan, and Susanne Boll. 2020. Social acceptability in HCI: A survey of methods, measures, and design strategies. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Marion Koelle, Torben Wallbaum, Wilko Heuten, and Susanne Boll. 2019. Evaluating a Wearable Camera’s Social Acceptability In-the-Wild. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI EA ’19). Association for Computing Machinery, New York, NY, USA, 1–6. https://doi.org/10.1145/3290607.3312837Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Daryush D. Mehta, Matías Zañartu, Shengran W. Feng, Harold A. Cheyne II, and Robert E. Hillman. 2012. Mobile Voice Health Monitoring Using a Wearable Accelerometer Sensor and a Smartphone Platform. IEEE Transactions on Biomedical Engineering 59, 11 (2012), 3090–3096. https://doi.org/10.1109/TBME.2012.2207896Google ScholarGoogle ScholarCross RefCross Ref
  17. Thomas Murry, William S Brown Jr, and Richard J Morris. 1995. Patterns of fundamental frequency for three types of voice samples. Journal of Voice 9, 3 (1995), 282–289.Google ScholarGoogle ScholarCross RefCross Ref
  18. Halley Profita, Reem Albaghli, Leah Findlater, Paul Jaeger, and Shaun K Kane. 2016. The AT effect: how disability affects the perceived social acceptability of head-mounted display use. In proceedings of the 2016 CHI conference on human factors in computing systems. 4884–4895.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Marcos Serrano, Barrett M Ens, and Pourang P Irani. 2014. Exploring the use of hand-to-face input for interacting with head-worn displays. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 3181–3190.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jong Won Shin, Joon-Hyuk Chang, and Nam Soo Kim. 2010. Voice activity detection based on statistical models and machine learning approaches. Computer Speech, Language 24, 3 (2010), 515–530. https://doi.org/10.1016/j.csl.2009.02.003 Emergent Artificial Intelligence Approaches for Pattern Recognition in Speech and Language Processing.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jongseo Sohn, Nam Soo Kim, and Wonyong Sung. 1999. A statistical model-based voice activity detection. IEEE Signal Processing Letters 6, 1 (1999), 1–3. https://doi.org/10.1109/97.736233Google ScholarGoogle ScholarCross RefCross Ref
  22. Silero Team. 2021. Silero VAD: pre-trained enterprise-grade Voice Activity Detector (VAD), Number Detector and Language Classifier. https://github.com/snakers4/silero-vad.Google ScholarGoogle Scholar
  23. Ohini Kafui Toffa and Max Mignotte. 2021. Environmental Sound Classification Using Local Binary Pattern and Audio Features Collaboration. IEEE Transactions on Multimedia 23 (2021), 3978–3985. https://doi.org/10.1109/TMM.2020.3035275Google ScholarGoogle ScholarCross RefCross Ref
  24. Hartmut Traunmüller and Anders Eriksson. 1995. The frequency range of the voice fundamental in the speech of male and female adults. Unpublished manuscript 11 (1995).Google ScholarGoogle Scholar
  25. Julie R Williamson, Andrew Crossan, and Stephen Brewster. 2011. Multimodal mobile interactions: usability studies in real world settings. In Proceedings of the 13th international conference on multimodal interfaces. 361–368.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Shubham Yadav, Patrice Abbie D. Legaspi, Mark S. Oude Alink, André B. J. Kokkeler, and Bram Nauta. 2023. Hardware Implementations for Voice Activity Detection: Trends, Challenges and Outlook. IEEE Transactions on Circuits and Systems I: Regular Papers 70, 3 (2023), 1083–1096. https://doi.org/10.1109/TCSI.2022.3225717Google ScholarGoogle ScholarCross RefCross Ref
  27. Dongwen Ying, Yonghong Yan, Jianwu Dang, and Frank K. Soong. 2011. Voice Activity Detection Based on an Unsupervised Learning Framework. IEEE Transactions on Audio, Speech, and Language Processing 19, 8 (2011), 2624–2633. https://doi.org/10.1109/TASL.2011.2125953Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Xiao-Lei Zhang and Ji Wu. 2013. Deep Belief Networks Based Voice Activity Detection. IEEE Transactions on Audio, Speech, and Language Processing 21, 4 (2013), 697–710. https://doi.org/10.1109/TASL.2012.2229986Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Hao Zhong and Kaifeng Bu. 2022. Privacy-Utility Trade-Off. arXiv preprint arXiv:2204.12057 (2022).Google ScholarGoogle Scholar

Index Terms

  1. MeMic: Towards Social Acceptability of User-Only Speech Recording Wearables

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CHI EA '24: Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems
      May 2024
      4761 pages
      ISBN:9798400703317
      DOI:10.1145/3613905

      Copyright © 2024 Owner/Author

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 May 2024

      Check for updates

      Qualifiers

      • Work in Progress
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate6,164of23,696submissions,26%
    • Article Metrics

      • Downloads (Last 12 months)42
      • Downloads (Last 6 weeks)42

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format