Lyrics recognition from singing voice is one of the most important
techniques for query-by-singing music information retrieval systems.
Lyrics information realizes a higher retrieval performance than retrieval
using only melody information.
However, recognizing
a song lyrics from singing voice is very difficult. In order to improve
recognition, a new method focused on correspondence between voice and
notes has been proposed. Note boundary scores are calculated for each
frame, and these values are included in feature vectors by expanding
their dimensions. The marker HMM is defined to correspond to feature
vectors located at note boundaries, and the marker HMM is inserted
among all morae in a pronunciation dictionary. As a result, the recognizer
restricts an individual mora to correspond to only one note.
We also modified the
marker HMM in order to account for short pauses in a particular position.
A short pause corresponding to a musical rest or breath may occur after
any morae, even if inside a word. The short pause HMM is concatenated
to the marker HMM, and a skip transition arc of the short pause HMM
is also introduced.
From experimental results,
the proposed model provided higher word accuracy than the baseline
model. It improved word accuracy from 85.71% to 93.18%, which means
that 52.3% of the word error rate decreased. Insertion errors, especially,
were drastically suppressed.
Cite as: Suzuki, M., Tomita, S., Morita, T. (2019) Lyrics Recognition from Singing Voice Focused on Correspondence Between Voice and Notes. Proc. Interspeech 2019, 3238-3241, doi: 10.21437/Interspeech.2019-1318
@inproceedings{suzuki19b_interspeech, author={Motoyuki Suzuki and Sho Tomita and Tomoki Morita}, title={{Lyrics Recognition from Singing Voice Focused on Correspondence Between Voice and Notes}}, year=2019, booktitle={Proc. Interspeech 2019}, pages={3238--3241}, doi={10.21437/Interspeech.2019-1318} }