Dataset articles

A Dataset of Larynx Microphone Recordings for Singing Voice Reconstruction

Authors:

Simon Schwär
Email Simon Schwär
Michael Krause
Michael Fast
Sebastian Rosenzweig
Frank Scherbaum
Meinard Müller

Abstract

Larynx microphones (LMs) make it possible to obtain practically crosstalk-free recordings of the human voice by picking up vibrations directly from the throat. This can be useful in a multitude of music information retrieval scenarios related to singing, e.g., the analysis of individual voices recorded in environments with lots of interfering noise. However, LMs have a limited frequency range and barely capture the effects of the vocal tract, which makes the recorded signal unsuitable for downstream tasks that require high-quality recordings. In this paper, we introduce the task of reconstructing a natural sounding, high-quality singing voice recording from an LM signal. With an explicit focus on the singing voice, the problem lies at the intersection of speech enhancement and singing voice synthesis with the additional requirement of faithful reproduction of expressive parameters like intonation. In this context, we make three main contributions. First, we publish a dataset with over 4 hours of popular music we recorded with four amateur singers accompanied by a guitar, where both LM and clean close-up microphone signals are available. Second, we propose a data-driven baseline approach for singing voice reconstruction from LM signals using differentiable signal processing, inspired by a source-filter model that emulates the missing vocal tract effects. Third, we evaluate the baseline with a listening test and further show that it can improve the accuracy of lyrics transcription as an exemplary downstream task.

Keywords:

Larynx Microphone Singing Voice Reconstruction Dataset Differentiable Signal Processing Singing Analysis

Year: 2024
Volume: 7 Issue: 1
Page/Article: 30–43
DOI: 10.5334/tismir.166

Submitted on 10 Mar 2023
Accepted on 6 Jan 2024
Published on 23 Feb 2024

Peer Reviewed