Music genre classification using LBP textural features
Highlights
► Music genre classification using LBP texture descriptors extracted from spectrograms. ► Evaluate performance with local feature extraction and with global feature extraction. ► Compare results with state of the art in Latin Music Database and in ISMIR 2004 database.
Introduction
With the rapid expansion of the Internet, a huge amount of data from different sources has become available online. Studies indicate that in 2007 the amount of digital data scattered around the world consumed about 281 exabytes. In 2011, the amount of digital information produced in the year should be equal nearly 1800 exabytes, or 10 times that produced in 2006 [1].
Among all the different sources of information, music certainly is the one that can benefit from this impressive growing since it can be shared by users with different background and education, easily crossing cultural and language barriers [2]. In general, indexing and retrieving music is based on meta information tags such as ID3 tags. This metadata includes information such as song title, artist, album, year, musical genre, etc. [3]. Among all these descriptors, musical genre is probably the most obvious descriptor which comes to mind, and it is probably the most widely used to organize and manage large digital music databases [4].
Taking into account previous works, we can find different reasons which motivate research on automatic music genre classification. McKay and Fujinaga [5] pointed out that individuals differ on how they classify a given recording, but they can also differ in terms of the pool of genre labels from which they choose. On the other hand, Gjerdingen and Perrot [6] claimed that people are consistent in their genre categorization, even when these categorizations are wrong, or for very short segments. Pachet and Cazaly [7] showed that some traditional music taxonomies, like taxonomy of music industry, and internet taxonomy, are very inconsistent. Finally, Pampalk [8] says that genre classification-based evaluations can be used as proxy for listening tests of music similarity.
According to Lidy et al. [9] there are different approaches to describe the contents of a given piece of music. The most commonly used is the content-based approach which extracts representative features from the digital audio signal. Other approaches such as semantic analysis and community metadata have proved to perform well for traditional Western music, however, their use for other kinds of music is compromised because both community meta-data and lyrics-based approaches are dependent of natural language processing (NLP) tools, which are typically more developed for English than other languages.
In the case of the content-based approach, one of the earlier works was introduced by Tzanetakis and Cook [10] where they represented a music piece using timbral texture, beat-related, and pitch-related features. The employed feature set has become of public use, as part of the MARSYAS framework (Music Analysis, Retrieval and SYnthesis for Audio Signals), and it has been widely used for music genre recognition [3], [9], [11]. Other characteristics such as Inter-Onset Interval Histogram Coefficients, Rhythm Patterns and its derivatives Statistical Spectrum Descriptors, and Rhythm Histograms have been proposed in the literature recently [12], [13], [14].
In spite of all efforts done during the last years, automatic music genre classification still remains an open problem. McKay and Fujinaga [5] pointed out some problematic aspects of genre and refer to some experiments where human beings were not able to classify correctly more than 76% of the music pieces [15]. In spite of the fact that more experimental evidence is needed, these experiments give some insights about the upper bounds on software performance. McKay and Fujinaga also suggest that different approaches should be proposed to achieve further improvements.
In light of this, in this paper we propose an alternative approach for music genre classification which converts the audio signal into a spectrogram [16] (short-time Fourier representation) and then extract features from this visual representation. The rationale behind this is that treating the time-frequency representation as a texture image we can extract features which are expected to be suitable to build a robust music genre classification system even if there is not a straight relation between the musical dimension and the extracted features. Furthermore, these image-based features may capture different information from the approaches that work directly with the audio signal. Fig. 1 illustrates two examples of spectrograms taken from music pieces of different genres. Fig. 1(a) shows a spectrogram taken from a classical music piece. In this case there is a very clear presence of almost horizontal lines, related to harmonic structures, while in Fig. 1(b) one can observe the intensive beats, typical of electronic music, depicted as clear vertical lines. The features used in this work are provided by Local Binary Pattern (LBP), a structural texture operator presented by Ojala et al. [17].
By analyzing the spectrogram images, we have noticed that the textures are not uniform, so we decided to consider a local feature extraction beyond the global feature extraction. Furthermore, our previous results [18] have shown that using Gray Level Co-occurrence Matrix (GLCM) descriptors, local feature extraction can help to improve outcomes in music genre classification using spectrograms. With this in mind, we have studied different zoning techniques to obtain local information of the given pattern beyond the global feature extraction. We also demonstrate through experimentation that certain zones of the spectrogram perform better than others.
The use of spectrograms in music genre classification has already been proposed in other works [18], [19], [20]. However, some important issues remain overlooked. Thus, some innovations are presented here, such as: the use of LBP structural approach in order to get texture descriptors from the spectrogram; zoning mechanism taking into account human perception in setting up frequency bands; creation of one individual classifier for each created zone, combining their outputs in order to get the final decision; and comparison of results with and without zoning mechanism with a structural texture descriptor.
Through a set of comprehensive experiments on the Latin Music Database [21] and on the ISMIR 2004 database [22], we demonstrate that in most cases the proposed approach compares favorably to the traditional approaches reported in the literature. The results obtained with LMD in this work can be directly compared with those obtained by Lopes et al. [23] and Costa et al. [18], since all of them used artist filter restriction and folds with exactly the same music pieces to perform the classifier training and testing. The overall recognition rate improvement was about 22.66% when comparing with [23], and about 15.13% when comparing with the best result obtained in [18]. Taking into account the best results obtained with LMD in Music Information Retrieval Evaluation eXchange (MIREX) 2009 and MIREX 2010 [24] competitions, the improvement was about 7.67% and 2.47%, respectively. Concerning ISMIR 2004 database, obtained results are comparable to results described in the literature. In addition, these results can corroborate the versatility of the proposed approach.
The remaining of this paper is organized as follows: Section 2 describes the music databases used in the experiments. Section 3 presents the LBP texture operator used to extract features in this work. Section 4 introduces the methodology used for classification while Section 5 reports all the experiments that have been carried out on music genre classification. Finally the last section presents the conclusions of this work as well as opens up some perspectives for future work.
Section snippets
Music databases
The LMD and the ISMIR 2004 database are among the most used music database for researching in Music Information Retrieval. These two databases were chosen because, taking into account the signal segmentation strategy described in Section 3, these are among those databases that could be used.
Feature extraction
Since our approach is based on the visual representation of the audio signal, the first step of the feature extraction process consists in converting the audio signal to a spectrogram. In the LMD, the spectrograms were created using audio files with the following technical features: bit rate of 352 kbps, audio sample size of 16 bits, one channel, and audio sample rate of 22.05 kHz. In the ISMIR 2004 database, the audio files used had the following technical features: bit rate of 706 kbps, audio
Methodology used for classification
The classifier used in this work was the Support Vector Machine (SVM) introduced by Vapnik in [28]. Normalization was performed by linearly scaling each attribute to the range . Different parameters and kernels for the SVM were tried out but the best results were achieved using a Gaussian kernel. Parameters cost and gamma were tuned using a grid search.
The classification process is done as follows: the three 10-s segments of the music are converted into the spectrograms (, ,
Experimental results and discussion
The following subsections present the experiments carried out with the global feature extraction and with the three different zoning mechanisms proposed in the previous section. Additional experiments carried out using acoustic features are also presented. The experimental results reported on the LMD refers to the average classification rates and standard deviations considering the three folds aforementioned.
Conclusion
In this paper we have presented an alternative approach for music genre classification which is based on texture images. Such visual representations are created by converting the audio signal representation into spectrograms images which can be divided into zones then that features can be extracted locally. We have demonstrated that, with LBP, there is a slight difference in terms of recognition rate when different zoning mechanisms are used and when a global feature extraction is performed,
Acknowledgments
This research has been partly supported by The National Council for Scientific and Technological Development (CNPq) grant 301653/2011-9, CAPES grant BEX 5779/11-1 and 223/09-FCT595-2009, Araucária Foundation grant 16767-424/2009, European Commission, FP7 (Seventh Framework Programme), ICT-2011.1.5 Networked Media and Search Systems, grant agreement No 287711; and the European Regional Development Fund through the Programme COMPETE and by National Funds through the Portuguese Foundation for
References (36)
Automatic identification of audio recordings based on statistical modeling
Signal Processing
(2010)- et al.
On the suitability of state-of-the-art music information retrieval methods for analyzing, categorizing and accessing non-western and ethnic music collections
Signal Processing
(2010) - et al.
From dynamic classifier selection to dynamic ensemble selection
Pattern Recognition
(2008) - J. Gantz, C. Chute, A. Manfrediz, S. Minton, D. Reinsel, W. Schlichting, A. Toncheva, The Diverse and Exploding Digital...
- et al.
Feature selection approach for automatic music genre classification
International Journal of Semantic Computing
(2009) - et al.
Representing musical genre: a state of the art
Journal of New Music Research
(2003) - C. McKay, I. Fujinaga, Musical genre classification: is it worth pursuing and how can it be improved? in: 7th...
- et al.
Scanning the dial: the rapid recognition of music genres
Journal of New Music Research
(2008) - F. Pachet, D. Cazaly, A taxonomy of musical genres, in: Proceedings of Content-Based Multimedia Information Access...
- E. Pampalk, Computational Models of Music Similarity and Their Application in Music Information Retrieval, PhD Thesis,...
Musical genre classification of audio signals
IEEE Transactions on Speech and Audio Processing
Spectrograms: turning signals into pictures
Journal of Engineering Technology
Multiresolution gray-scale and rotation invariant texture classification with local binary patterns
IEEE Transactions on Pattern Analysis and Machine Intelligence
Cited by (149)
CLBP Texture Descriptor in Multipartite Complex Network Configuration for Music Genre Classification
2023, Procedia Computer SciencePMG-Net: Persian music genre classification using deep neural networks
2023, Entertainment ComputingHierarchical mining with complex networks for music genre classification
2022, Digital Signal Processing: A Review JournalAn intelligent music genre analysis using feature extraction and classification using deep learning techniques
2022, Computers and Electrical EngineeringAn automated multispecies bioacoustics sound classification method based on a nonlinear pattern: Twine-pat
2022, Ecological InformaticsGenre Classification in Music using Convolutional Neural Networks
2024, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)