Research articles

Learning Audio–Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification

Authors:

Matthias Dorfer
Email Matthias Dorfer
Jan Hajič jr.
Andreas Arzt
Harald Frostel
Gerhard Widmer

Abstract

This work addresses the problem of matching musical audio directly to sheet music, without any higher-level abstract representation. We propose a method that learns joint embedding spaces for short excerpts of audio and their respective counterparts in sheet music images, using multimodal convolutional neural networks. Given the learned representations, we show how to utilize them for two sheet-music-related tasks: (1) piece/score identification from audio queries and (2) retrieving relevant performances given a score as a search query. All retrieval models are trained and evaluated on a new, large scale multimodal audio–sheet music dataset which is made publicly available along with this article. The dataset comprises 479 precisely annotated solo piano pieces by 53 composers, for a total of 1,129 pages of music and about 15 hours of aligned audio, which was synthesized from these scores. Going beyond this synthetic training data, we carry out first retrieval experiments using scans of real sheet music of high complexity (e.g., nearly the complete solo piano works by Frederic Chopin) and commercial recordings by famous concert pianists. Our results suggest that the proposed method, in combination with the large-scale dataset, yields retrieval models that successfully generalize to data way beyond the synthetic training data used for model building.

Keywords:

Multimodal embedding space learning audio-sheet music retrieval

Year: 2018
Volume: 1 Issue: 1
Page/Article: 22-33
DOI: 10.5334/tismir.12

Submitted on 25 Jan 2018
Accepted on 20 Mar 2018
Published on 4 Sep 2018

Peer Reviewed