Published November 4, 2023 | Version v1
Conference paper Open

Introducing DiMCAT for Processing and Analyzing Notated Music on a Very Large Scale

Description

As corpora of digital musical scores continue to grow, the need for research tools capable of manipulating such data efficiently, with an intuitive interface, and support for a diversity of file formats, becomes increasingly pressing. In response, this paper introduces the Digital Musicology Corpus Analysis Toolkit (DiMCAT), a Python library for processing large corpora of digitally encoded musical scores. Equally aimed at music-analytical corpus studies, MIR, and machine-learning research, DiMCAT performs common data transformations and analyses using dataframes. Dataframes reduce the inherent complexity of atomic score contents (e.g., notes), larger score entities (e.g., measures), and abstractions (e.g., chord symbols) into easily manipulable computational structures, whose vectorized operations scale to large quantities of musical material. The design of DiMCAT's API prioritizes computational speed and ease of use, thus aiming to cater to machine-learning practitioners and musicologists alike.

Files

000061.pdf

Files (158.9 kB)

Name Size Download all
md5:2342d7e9037308061cd78ffcfd9c94b5
158.9 kB Preview Download