Hilbert spectrum based features for speech/music classification

Kumar,  Arvind; Solanki,  Sandeep Singh; Chandra,  Mahesh

doiSerbia

Home

National library of Serbia

About the journal

Editorial policy

Instructions for authors

Cobiss

All issues

Serbian Journal of Electrical Engineering 2022 Volume 19, Issue 2, Pages: 239-259
https://doi.org/10.2298/SJEE2202239K
Full text ( 1171 KB)

Hilbert spectrum based features for speech/music classification

Kumar Arvind (Department of ECE, Birla Institute of Technology, Ranchi, India), arvind9835@gmail.com
Solanki Sandeep Singh (Department of ECE, Birla Institute of Technology, Ranchi, India), sssolanki@bitmesra.com
Chandra Mahesh (Department of ECE, Reva University, Bengaluru), shrotriya69@rediffmail.com

Automatic Speech/Music classification uses different signal processing techniques to categorize multimedia content into different classes. The proposed work explores Hilbert Spectrum (HS) obtained from different AM-FM components of an audio signal, also called Intrinsic Mode Functions (IMFs) to classify an incoming audio signal into speech/music signal. The HS is a twodimensional representation of instantaneous energies (IE) and instantaneous frequencies (IF) obtained using Hilbert Transform of the IMFs. This HS is further processed using Mel-filter bank and Discrete Cosine Transform (DCT) to generate novel IF and Instantaneous Amplitude (IA) based cepstral features. Validations of the results were done using three databases-Slaney Database, GTZAN and MUSAN database. To evaluate the general applicability of the proposed features, extensive experiments were conducted on different combination of audio files from S&S, GTZAN and MUSAN database and promising results are achieved. Finally, performance of the system is compared with performance of existing cepstral features and previous works in this domain.

Keywords: EMD, Hilbert Spectrum, Hilbert Huang Transform, Cepstral Features, Speech/Music Classification

Show references

E. Didiot, I. Illina, D. Fohr, O. Mella: A Wavelet-Based Parameterization for Speech/Music Discrimination, Computer Speech & Language, Vol. 24, No. 2, April 2010, pp. 341-357.

M. K. S. Khan, W. G. Al-Khatib: Machine-Learning Based Classification of Speech and Music, Multimedia Systems, Vol. 12, No. 1, August 2006, pp. 55-67.

Y. Lavner, D. Ruinskiy: A Decision-Tree-Based Algorithm for Speech/Music Classification and Segmentation, EURASIP Journal on Audio, Speech, and Music Processing, Vol. 2009, June 2009, pp. 239892.

A. Pikrakis, T. Giannakopoulos, S. Theodoridis: A Speech/Music Discriminator of Radio Recordings based on Dynamic Programming and Bayesian Networks, IEEE Transactions on Multimedia, Vol. 10, No. 5, August 2008, pp. 846-857.

E. Scheirer, M. Slaney: Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Munich, Germany, April 1997, pp. 1331-1334.

J. Shirazi, S. Ghaemmaghami: Improvement to Speech-Music Discrimination Using Sinusoidal Model Based Features, Multimedia Tools and Applications, Vol. 50, No. 2, November 2010, pp. 415-435.

W. Q. Wang, W. Gao, D. W. Ying: A Fast and Robust Speech/Music Discrimination Approach, Proceedings of the 4th International Conference on Information, Communications and Signal Processing, Singapore, Singapore, December 2003, pp. 1325-1329.

Q. Wu, Q. Yan, H. Deng, J. Wang: A Combination of Data Mining Method with Decision Trees Building for Speech/Music Discrimination, Computer Speech & Language, Vol. 24, No. 2, April 2010, pp. 257-272.

H. Zhou, A. Sadka, R. M. Jiang: Feature Extraction for Speech and Music Discrimination, Proceedings of the International Workshop on Content-Based Multimedia Indexing, London, UK, June 2008, pp. 170-173.

J. Saunders: Real-Time Discrimination of Broadcast Speech/Music, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Atlanta, USA, May 1996, pp. 993-996.

N. Ruiz-Reyes, P. Vera-Candeas, J. E. Muñoz, S. García-Galán, F. J. Cañadas: New Speech/Music Discrimination Approach based on Fundamental Frequency Estimation, Multimedia Tools and Applications, Vol. 41, No. 2, January 2009, pp. 253-286.

A. Ghosal, B. C. Dhara, S. K. Saha: Speech/Music Classification Using Empirical Mode Decomposition, Proceedings of the 2nd International Conference on Emerging Applications of Information Technology, Kolkata, India, February 2011, pp. 49-52.

M. Kos, Z. Kačič, D. Vlaj: Acoustic Classification and Segmentation Using Modified Spectral Roll-Off and Variance-Based Features, Digital Signal Processing, Vol. 23, No. 2, March 2013, pp. 659-674.

G. Sell, P. Clark: Music Tonality Features for Speech/Music Discrimination, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, May 2014, pp. 2489-2493.

B. K. Khonglah, S. R. Mahadeva Prasanna: Speech/Music Classification Using Speech- Specific Features, Digital Signal Processing, Vol. 48, January 2016, pp. 71-83.

H. Zhang, X.- K. Yang, W.- Q. Zhang, W.- L. Zhang, J. Liu: Application of I-Vector in Speech and Music Classification, Proceedings of the IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), Limassol, Cyprus, December 2016, pp. 1-5.

C. Lim, J. H. Chang: Efficient Implementation Techniques of an SVM-Based Speech/Music Classifier in SMV, Multimedia Tools and Applications, Vol. 74, No. 15, August 2015, pp. 5375-5400.

N. Tsipas, L. Vrysis, C. Dimoulas, G. Papanikolaou: Efficient Audio-Driven Multimedia Indexing Through Similarity-Based Speech/Music Discrimination, Multimedia Tools and Applications, Vol. 76, No. 24, December 2017, pp. 25603-25621.

G. Fuchs: A Robust Speech/Music Discriminator for Switched Audio Coding, Proceedings of the 23rd European Signal Processing Conference (EUSIPCO), Nice, France, August 2015, pp. 569-573.

S. Kacprzak, M. Ziółko: Speech/Music Discrimination via Energy Density Analysis, Proceedings of the 1st International Conference on Statistical Language and Speech Processing, Tarragona, Spain, July 2013, pp. 135-142.

A. Ghosal, S. Dutta: Speech/Music Discrimination Using Perceptual Feature, Proceedings of the International Conference on Computational Science and Engineering, Beliaghata, India, October 2016, pp. 71-76.

P. Tapkir, H. A. Patil: Novel Empirical Mode Decomposition Cepstral Features for Replay Spoof Detection, Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Hyderabad, India, September 2018, pp. 721-725.

B. Karan, S. S. Sahu, K. Mahto: Parkinson Disease Prediction Using Intrinsic Mode Function Based Features from Speech Signal, Biocybernetics and Biomedical Engineering, Vol. 40, No. 1, January 2020, pp. 249-264.

G. Alipoor, E. Samadi: Robust Speaker Gender Identification Using EMD-Based Cepstral Features, Asia-Pacific Journal of Information Technology and Multimedia, Vol. 7, No. 1, June 2018, pp. 71-81.

E. Samadi, G. Alipoor: Efficient Band Selection for Improving the Robustness of the EMDBased Cepstral Features, Sādhanā, Vol. 44, No. 3, March 2019, p. 54.

L. Kerkeni, Y. Serrestou, K. Raoof, M. Mbarki, M. Ali Mahjoub, C. Cleder: Automatic Speech Emotion Recognition Using an Optimal Combination of Features based on EMDTKEO, Speech Communication, Vol. 114, November 2019, pp. 22-35.

B. K. Khonglah, R. Sharma, S. R. Mahadeva Prasanna: Speech vs Music Discrimination Using Empirical Mode Decomposition, Proceedings of the 21st National Conference on Communications (NCC), Mumbai, India, February 2015, pp. 1-6.

R. Sharma, R. K. Bhukya, S. R. Mahadeva Prasanna: Analysis of the Hilbert Spectrum for Text-Dependent Speaker Verification, Speech Communication, Vol. 96, February 2018, pp. 207-224.

Dan Ellis: The Music-Speech Corpus, Available at: https://labrosa.ee.columbia.edu/sounds/musp/scheislan.html

MARSYAS, Available at: http://marsyas.info/downloads/datasets.html

D. Snyder, G. Chen, D. Povey: MUSAN: A Music, Speech, and Noise Corpus, arXiv:1510.08484 [cs.SD], October 2015, pp. 1-4.

N. E. Huang, Z. Shen, S. R. Long, M. C. Wu, H. H. Shih, Q. Zheng, N.- C. Yen, C. C. Tung, H. H. Liu: The Empirical Mode Decomposition and the Hilbert Spectrum for Nonlinear and Non-Stationary Time Series Analysis, Proceedings of the Royal Society of London A, Vol. 454, March 1998, pp. 903-995.

P. Cosi: Evidence Against Frame-Based Analysis Techniques, Proceedings of NATO Advance Institute on Computational Hearing, Il Ciocco, July 1998, pp. 163-168.

C.- S. Jung, K. J. Han, H. Seo, S. S. Narayanan, H.- G. Kang: A Variable Frame Length and Rate Algorithm based on the Spectral Kurtosis Measure for Speaker Verification, Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Chiba, Japan, September 2010, pp. 2754-2757.

M. S. Deshpande, R. S. Holambe: Speaker Identification based on Robust AM-FM Features, Proceedings of the 2nd International Conference on Emerging Trends in Engineering & Technology, Nagpur, India, December 2009, pp. 880-884.

R. Sharma, S. R. Mahadeva Prasanna, R. K. Bhukya, R. Kumar Das: Analysis of the Intrinsic Mode Functions for Speaker Information, Speech Communication, Vol. 91, July 2017, pp. 1-16.

P. Flandrin, G. Rilling, P. Goncalves: Empirical Mode Decomposition as a Filter Bank, IEEE Signal Processing Letters, Vol. 11, No. 2, February 2004, pp. 112-114.

X. Li, X. Li: Speech Emotion Recognition Using Novel HHT-TEO Based Features, Journal of Computers, Vol. 6, No. 5, May 2011, pp. 989-998.

M. R. Kamble, H. Tak, H. A. Patil: Effectiveness of Speech Demodulation-Based Features for Replay Detection, Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Hyderabad, India, September 2018, pp. 641-645.

C. Cortes, V. Vapnik: Soft Margin Classifier, U.S. Patent, Patent No. 5,640,492, June 1997.

L. van der Maaten, G. Hinton: Visualizing Data Using t-SNE, Journal of Machine Learning Research, Vol. 9, No. 86, November 2008, pp. 2579-2605.

G. K. Birajdar, M. D. Patil: Speech and Music Classification Using Spectrogram Based Statistical Descriptors and Extreme Learning Machine, Multimedia Tools and Applications, Vol. 78, No. 11, June 2019, pp. 15141-15168.

G. Roffo, S. Melzi, U. Castellani, A. Vinciarelli: Infinite Latent Feature Selection: A Probabilistic Latent Graph-Based Ranking Approach, Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, October 2017, pp. 1407-1415.

Citation export
Email this article

Developed and maintained by National Library of Serbia 2005-2024

ISSN - 2683-3867
COBISS.SR-ID - 278404108