Serbian Journal of Electrical Engineering 2022 Volume 19, Issue 2, Pages: 239-259
https://doi.org/10.2298/SJEE2202239K
Full text ( 1171 KB)


Hilbert spectrum based features for speech/music classification

Kumar Arvind (Department of ECE, Birla Institute of Technology, Ranchi, India), arvind9835@gmail.com
Solanki Sandeep Singh (Department of ECE, Birla Institute of Technology, Ranchi, India), sssolanki@bitmesra.com
Chandra Mahesh (Department of ECE, Reva University, Bengaluru), shrotriya69@rediffmail.com

Automatic Speech/Music classification uses different signal processing techniques to categorize multimedia content into different classes. The proposed work explores Hilbert Spectrum (HS) obtained from different AM-FM components of an audio signal, also called Intrinsic Mode Functions (IMFs) to classify an incoming audio signal into speech/music signal. The HS is a twodimensional representation of instantaneous energies (IE) and instantaneous frequencies (IF) obtained using Hilbert Transform of the IMFs. This HS is further processed using Mel-filter bank and Discrete Cosine Transform (DCT) to generate novel IF and Instantaneous Amplitude (IA) based cepstral features. Validations of the results were done using three databases-Slaney Database, GTZAN and MUSAN database. To evaluate the general applicability of the proposed features, extensive experiments were conducted on different combination of audio files from S&S, GTZAN and MUSAN database and promising results are achieved. Finally, performance of the system is compared with performance of existing cepstral features and previous works in this domain.

Keywords: EMD, Hilbert Spectrum, Hilbert Huang Transform, Cepstral Features, Speech/Music Classification


Show references