Music genre classification using LBP textural features

doi:10.1016/j.sigpro.2012.04.023

Signal Processing

Volume 92, Issue 11, November 2012, Pages 2723-2737

https://doi.org/10.1016/j.sigpro.2012.04.023 Get rights and content

Abstract

In this paper we present an approach to music genre classification which converts an audio signal into spectrograms and extracts texture features from these time-frequency images which are then used for modeling music genres in a classification system. The texture features are based on Local Binary Pattern, a structural texture operator that has been successful in recent image classification research. Experiments are performed with two well-known datasets: the Latin Music Database (LMD), and the ISMIR 2004 dataset. The proposed approach takes into account some different zoning mechanisms to perform local feature extraction. Results obtained with and without local feature extraction are compared. We compare the performance of texture features with that of commonly used audio content based features (i.e. from the MARSYAS framework), and show that texture features always outperforms the audio content based features. We also compare our results with results from the literature. On the LMD, the performance of our approach reaches about 82.33%, above the best result obtained in the MIREX 2010 competition on that dataset. On the ISMIR 2004 database, the best result obtained is about 80.65%, i.e. below the best result on that dataset found in the literature.

Highlights

► Music genre classification using LBP texture descriptors extracted from spectrograms. ► Evaluate performance with local feature extraction and with global feature extraction. ► Compare results with state of the art in Latin Music Database and in ISMIR 2004 database.

Introduction

With the rapid expansion of the Internet, a huge amount of data from different sources has become available online. Studies indicate that in 2007 the amount of digital data scattered around the world consumed about 281 exabytes. In 2011, the amount of digital information produced in the year should be equal nearly 1800 exabytes, or 10 times that produced in 2006 [1].

Among all the different sources of information, music certainly is the one that can benefit from this impressive growing since it can be shared by users with different background and education, easily crossing cultural and language barriers [2]. In general, indexing and retrieving music is based on meta information tags such as ID3 tags. This metadata includes information such as song title, artist, album, year, musical genre, etc. [3]. Among all these descriptors, musical genre is probably the most obvious descriptor which comes to mind, and it is probably the most widely used to organize and manage large digital music databases [4].

Taking into account previous works, we can find different reasons which motivate research on automatic music genre classification. McKay and Fujinaga [5] pointed out that individuals differ on how they classify a given recording, but they can also differ in terms of the pool of genre labels from which they choose. On the other hand, Gjerdingen and Perrot [6] claimed that people are consistent in their genre categorization, even when these categorizations are wrong, or for very short segments. Pachet and Cazaly [7] showed that some traditional music taxonomies, like taxonomy of music industry, and internet taxonomy, are very inconsistent. Finally, Pampalk [8] says that genre classification-based evaluations can be used as proxy for listening tests of music similarity.

According to Lidy et al. [9] there are different approaches to describe the contents of a given piece of music. The most commonly used is the content-based approach which extracts representative features from the digital audio signal. Other approaches such as semantic analysis and community metadata have proved to perform well for traditional Western music, however, their use for other kinds of music is compromised because both community meta-data and lyrics-based approaches are dependent of natural language processing (NLP) tools, which are typically more developed for English than other languages.

In the case of the content-based approach, one of the earlier works was introduced by Tzanetakis and Cook [10] where they represented a music piece using timbral texture, beat-related, and pitch-related features. The employed feature set has become of public use, as part of the MARSYAS framework (Music Analysis, Retrieval and SYnthesis for Audio Signals), and it has been widely used for music genre recognition [3], [9], [11]. Other characteristics such as Inter-Onset Interval Histogram Coefficients, Rhythm Patterns and its derivatives Statistical Spectrum Descriptors, and Rhythm Histograms have been proposed in the literature recently [12], [13], [14].

In spite of all efforts done during the last years, automatic music genre classification still remains an open problem. McKay and Fujinaga [5] pointed out some problematic aspects of genre and refer to some experiments where human beings were not able to classify correctly more than 76% of the music pieces [15]. In spite of the fact that more experimental evidence is needed, these experiments give some insights about the upper bounds on software performance. McKay and Fujinaga also suggest that different approaches should be proposed to achieve further improvements.

In light of this, in this paper we propose an alternative approach for music genre classification which converts the audio signal into a spectrogram [16] (short-time Fourier representation) and then extract features from this visual representation. The rationale behind this is that treating the time-frequency representation as a texture image we can extract features which are expected to be suitable to build a robust music genre classification system even if there is not a straight relation between the musical dimension and the extracted features. Furthermore, these image-based features may capture different information from the approaches that work directly with the audio signal. Fig. 1 illustrates two examples of spectrograms taken from music pieces of different genres. Fig. 1(a) shows a spectrogram taken from a classical music piece. In this case there is a very clear presence of almost horizontal lines, related to harmonic structures, while in Fig. 1(b) one can observe the intensive beats, typical of electronic music, depicted as clear vertical lines. The features used in this work are provided by Local Binary Pattern (LBP), a structural texture operator presented by Ojala et al. [17].

By analyzing the spectrogram images, we have noticed that the textures are not uniform, so we decided to consider a local feature extraction beyond the global feature extraction. Furthermore, our previous results [18] have shown that using Gray Level Co-occurrence Matrix (GLCM) descriptors, local feature extraction can help to improve outcomes in music genre classification using spectrograms. With this in mind, we have studied different zoning techniques to obtain local information of the given pattern beyond the global feature extraction. We also demonstrate through experimentation that certain zones of the spectrogram perform better than others.

The use of spectrograms in music genre classification has already been proposed in other works [18], [19], [20]. However, some important issues remain overlooked. Thus, some innovations are presented here, such as: the use of LBP structural approach in order to get texture descriptors from the spectrogram; zoning mechanism taking into account human perception in setting up frequency bands; creation of one individual classifier for each created zone, combining their outputs in order to get the final decision; and comparison of results with and without zoning mechanism with a structural texture descriptor.

Through a set of comprehensive experiments on the Latin Music Database [21] and on the ISMIR 2004 database [22], we demonstrate that in most cases the proposed approach compares favorably to the traditional approaches reported in the literature. The results obtained with LMD in this work can be directly compared with those obtained by Lopes et al. [23] and Costa et al. [18], since all of them used artist filter restriction and folds with exactly the same music pieces to perform the classifier training and testing. The overall recognition rate improvement was about 22.66% when comparing with [23], and about 15.13% when comparing with the best result obtained in [18]. Taking into account the best results obtained with LMD in Music Information Retrieval Evaluation eXchange (MIREX) 2009 and MIREX 2010 [24] competitions, the improvement was about 7.67% and 2.47%, respectively. Concerning ISMIR 2004 database, obtained results are comparable to results described in the literature. In addition, these results can corroborate the versatility of the proposed approach.

The remaining of this paper is organized as follows: Section 2 describes the music databases used in the experiments. Section 3 presents the LBP texture operator used to extract features in this work. Section 4 introduces the methodology used for classification while Section 5 reports all the experiments that have been carried out on music genre classification. Finally the last section presents the conclusions of this work as well as opens up some perspectives for future work.

Section snippets

Music databases

The LMD and the ISMIR 2004 database are among the most used music database for researching in Music Information Retrieval. These two databases were chosen because, taking into account the signal segmentation strategy described in Section 3, these are among those databases that could be used.

Feature extraction

Since our approach is based on the visual representation of the audio signal, the first step of the feature extraction process consists in converting the audio signal to a spectrogram. In the LMD, the spectrograms were created using audio files with the following technical features: bit rate of 352 kbps, audio sample size of 16 bits, one channel, and audio sample rate of 22.05 kHz. In the ISMIR 2004 database, the audio files used had the following technical features: bit rate of 706 kbps, audio

Methodology used for classification

The classifier used in this work was the Support Vector Machine (SVM) introduced by Vapnik in [28]. Normalization was performed by linearly scaling each attribute to the range $[- 1, + 1]$ . Different parameters and kernels for the SVM were tried out but the best results were achieved using a Gaussian kernel. Parameters cost and gamma were tuned using a grid search.

The classification process is done as follows: the three 10-s segments of the music are converted into the spectrograms ( ${\bar{ϒ}}_{beg}$ , ${\bar{ϒ}}_{mid}$ ,

Experimental results and discussion

The following subsections present the experiments carried out with the global feature extraction and with the three different zoning mechanisms proposed in the previous section. Additional experiments carried out using acoustic features are also presented. The experimental results reported on the LMD refers to the average classification rates and standard deviations considering the three folds aforementioned.

Conclusion

In this paper we have presented an alternative approach for music genre classification which is based on texture images. Such visual representations are created by converting the audio signal representation into spectrograms images which can be divided into zones then that features can be extracted locally. We have demonstrated that, with LBP, there is a slight difference in terms of recognition rate when different zoning mechanisms are used and when a global feature extraction is performed,

Acknowledgments

This research has been partly supported by The National Council for Scientific and Technological Development (CNPq) grant 301653/2011-9, CAPES grant BEX 5779/11-1 and 223/09-FCT595-2009, Araucária Foundation grant 16767-424/2009, European Commission, FP7 (Seventh Framework Programme), ICT-2011.1.5 Networked Media and Search Systems, grant agreement No 287711; and the European Regional Development Fund through the Programme COMPETE and by National Funds through the Portuguese Foundation for

References (36)

N. Orio
Automatic identification of audio recordings based on statistical modeling
Signal Processing
(2010)
T. Lidy et al.
On the suitability of state-of-the-art music information retrieval methods for analyzing, categorizing and accessing non-western and ethnic music collections
Signal Processing
(2010)
A.H.R. Ko et al.
From dynamic classifier selection to dynamic ensemble selection
Pattern Recognition
(2008)
J. Gantz, C. Chute, A. Manfrediz, S. Minton, D. Reinsel, W. Schlichting, A. Toncheva, The Diverse and Exploding Digital...
C.N. Silla et al.
Feature selection approach for automatic music genre classification
International Journal of Semantic Computing
(2009)
J.J. Aucouturier et al.
Representing musical genre: a state of the art
Journal of New Music Research
(2003)
C. McKay, I. Fujinaga, Musical genre classification: is it worth pursuing and how can it be improved? in: 7th...
R.O. Gjerdingen et al.
Scanning the dial: the rapid recognition of music genres
Journal of New Music Research
(2008)
F. Pachet, D. Cazaly, A taxonomy of musical genres, in: Proceedings of Content-Based Multimedia Information Access...
E. Pampalk, Computational Models of Music Similarity and Their Application in Music Information Retrieval, PhD Thesis,...

G. Tzanetakis et al.

Musical genre classification of audio signals

IEEE Transactions on Speech and Audio Processing

(2002)

T. Li, M. Ogihara, Q. Li, A comparative study on content-based music genre classification, in: 26th Annual...

F. Gouyon, S. Dixon, E. Pampalk, G. Widmer, Evaluating rhythmic descriptors for musical genre classification, in: 25th...

A. Rauber, E. Pampalk, D. Merkl, Using psycho-acoustic models and self-organizing maps to create a hierarchical...

T. Lidy, A. Rauber, Evaluation of feature extractors and psycho-acoustic transformations for music genre...

S. Lippens, J.P. Martens, M. Leman, B. Baets, H. Meyer, G. Tzanetakis, A comparison of human and automatic musical...

M.R. French et al.

Spectrograms: turning signals into pictures

Journal of Engineering Technology

(2007)

T. Ojala et al.

Multiresolution gray-scale and rotation invariant texture classification with local binary patterns

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2002)

Cited by (149)

CLBP Texture Descriptor in Multipartite Complex Network Configuration for Music Genre Classification
2023, Procedia Computer Science
Music genres define the characteristics that musical pieces must have to belong to a given class. These characteristics are reflected in the audio signal and, consequently, in the image that represents its spectral content: the spectrogram. In this paper, we propose a Music Genre Classification (MGC) system based on representation with complex networks of CLBP (Completed Local Binary Pattern) texture descriptor codes extracted from spectrograms: mel-spectrogram and gammatonegram. Complex networks were generated using CLBP codes in multipartite configuration: mono, bi, and tripartite networks; where the three node types are signal (CLBP-S), magnitude (CLBP-M) and central (CLBP-C) codes. The networks were mined using conventional, textural, and multipartite topological measures. In order to test the proposed MGC, we used the GTZAN dataset and defined several experiments using combinations of multipartite measures: 1) monopartite, 2) mono and bipartite, and 3) mono, bi and tripartite. All experiments were performed for each spectrogram individually and jointly. In the machine learning stage, we used the ensemble classifier Bagging with Random Forest, and 10-fold cross-validation repeated 100 times. As a main result, it was found that the bipartite measures related to CLBP-C decrease the performance, but the tripartites increased it. Moreover, in most experiments using only gammatonogram the performance was better. Consequently, the experiment using tripartite measures extracted from the gammatonegram revealed a satisfactory result, indicating that the proposed MGC is promising.
PMG-Net: Persian music genre classification using deep neural networks
2023, Entertainment Computing
Music genres can reveal our preferences and are one of the main tools for retailers, libraries, and people to organize music. In addition, the music industry uses genres as a key method to define and target different markets, and thus, being able to categorize genres is an asset for marketing and music production. Several pieces of research have been done to classify western music genres, yet nothing has been done to classify Persian music genres so far. In this research, a tailored deep neural network-based method, termed PMG-Net, is introduced to automatically classify Persian music genres. Also, to assess the PMG-Net, a dataset, named PMG-Data, consisting of 500 music from different genres of Pop, Rap, Traditional, Rock, and Monody are collected and labeled, which is made publicly available for researchers. The accuracy obtained by PMG-Net on the PMG-Data is 86%, indicating an acceptable performance of the method compared with the existing deep neural network-based approaches.
Hierarchical mining with complex networks for music genre classification
2022, Digital Signal Processing: A Review Journal
Music genre is an important feature to identify a musical work. Thus, it is the most used label to organize musical datasets. However, this label is not always available and its identification is not a simple and direct task. Hence, in literature we can find many music genre classification (MGC) methods, with a variety of features and machine learning algorithms (MLA). In this paper, we propose an MGC system by using two levels of hierarchical mining, GLCM (gray level co-occurrence matrix) networks generated from the mel-spectrogram and a multi-hybrid feature strategy. Three types of complex networks were generated: GLCM network $G_{g}$ , Superpixels network $G_{s}$ , and GLCM network of each node of $G_{s}$ ( $G_{g}^{s_{i}}$ network). The multi-hybrid features are formed by textural and topological measures of complex networks and acoustic measures. In the classification step, we used three datasets: GTZAN, Homburg, and ISMIR; two MLAs belonging to the classifier ensemble approach, and (10)-fold cross-validation repeated 100 times. Several experiments were performed using feature combinations of macro-mining (global features of $G_{g}$ and $G_{s}$ ) and micro-mining (global features of $G_{g}^{s_{i}}$ ). For GTZAN, we performed a detailed analysis of individual class performance and calculated our new ranking logarithmic score (RLS) applied to the F₁-score. For all datasets, the RLS and accuracy values were compared with several state-of-the-art methods. The accuracy obtained using micro-mining was $> 90 %$ , which reveals a satisfactory result.
An intelligent music genre analysis using feature extraction and classification using deep learning techniques
2022, Computers and Electrical Engineering
Music genre designations are useful for grouping songs, albums, and performers with comparable musical characteristics into larger categories. The goal of our study and research is to develop a deep learning method that can predict and classify song genres better than existing algorithms. Here the dataset of music genre information has been collected and processed for predicting genre of songs. We present a new approach including feature extraction and classification that takes into account the disparities in spectrums. The dataset namely MSD-I dataset, GTZAN Dataset and ISMIR2004 Genre dataset are utilized for feature extracted using BiLSTM and classification of extracted features has been done using VGG-16 Net. The effect of proposed approach is then evaluated in experiments on single and multi-label genre classification. The results are obtained based on the parameters of accuracy of 97%, precision of 94%, recall of 86.5%, F-1 score of 77.8%, average loss of audio signal of 40% for proposed technique.
An automated multispecies bioacoustics sound classification method based on a nonlinear pattern: Twine-pat
2022, Ecological Informatics
Categorizing the bioacoustic and ecoacoustic properties of animals is great interest to biologists and ecologists. Also, multidisciplinary studies in engineering have significantly contributed to the development of acoustic analysis. Observing the animals living in the ecological environment provides information in many areas such as global warming, climate changes, monitoring of endangered animals, agricultural activities. However, the classification of bioacoustics sounds by manually is very hard. Therefore, automated bioacoustics sound classification is crucial for ecological science. This work presents a new multispecies bioacoustics sound dataset and novel machine learning model to classify bird and anuran species with sounds automatically. In this model, a new nonlinear textural feature generation function is presented by using twine cipher substitution box(S-box), and this feature generation function is named twine-pat. By using twine-pat and tunable Q-factor wavelet transform, a multilevel feature generation network is presented. Iterative ReliefF(IRF) is employed to select the most effective/valuable features. Two shallow classifiers are used to calculate results. Our presented model reached 98.75% accuracy by using k-nearest neighbor(kNN) classifier. The results obviously demonstrated the success of the presented model.
Genre Classification in Music using Convolutional Neural Networks
2024, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

View all citing articles on Scopus

View full text

Music genre classification using LBP textural features

Abstract

Highlights

Introduction

Section snippets

Music databases

Feature extraction

Methodology used for classification

Experimental results and discussion

Conclusion

Acknowledgments

Signal Processing

Signal Processing

Pattern Recognition

Feature selection approach for automatic music genre classification

International Journal of Semantic Computing

Representing musical genre: a state of the art

Journal of New Music Research

Scanning the dial: the rapid recognition of music genres

Journal of New Music Research

Musical genre classification of audio signals

IEEE Transactions on Speech and Audio Processing

Spectrograms: turning signals into pictures

Journal of Engineering Technology

Multiresolution gray-scale and rotation invariant texture classification with local binary patterns

IEEE Transactions on Pattern Analysis and Machine Intelligence