Research on Ethnic Original Ecological Music Clustering of MFCC Based on HHT

In view of the characteristics of ethnic original ecological music and the limitation of human ear’s understanding of hearing. This paper presents a MFCC feature extraction method based on HHT transformation. In this method, the ethnic original ecological music signals were decomposed into several inherent mode functions (IMFs) by EEMD and Hilbert transform, in which the Hilbert marginal spectrum of each IMF was used to complete the feature extraction of HHTMFCC through the MEL scale filter. The experiment collected 13 kinds of original ecological music of ethnic minorities in Yunnan. Based on the feature extraction results of HHTMFCC, Kmeans clustering analysis method was adopted to compare and analyze the clustering effect of HHTMFCC. The experimental results showed that the HHTMFCC feature was 0.49 higher than that of MFCC in the Purity index, 1.2 lower than that of Entropy index, and 3% higher than that of F index. The experimental results show that the HHTMFCC features extracted from ethnic original music are better than the traditional MFCC features.


Introduction
MFCCS (Mel Frequency Cepstral Coefficents) was proposed by Davis and Mermelstein in 1980. It is a widely used feature in speech recognition. MFCC can effectively reflect the static characteristics of speech parameters [1]. In order to optimize the speech feature extraction method, Shao Mingqiang et al.  [2] damaged the sound signal while removing the noise in the existing algorithm. A method of speech feature extraction based on MFCC_P is proposed. The filter banks are arranged neatly without overlap, and the noise is effectively eliminated. Li Zhizhong et al. [3] proposed the mean Meir frequency cepstrum coefficient (AMFCC), which was used for classification and recognition by support vector machine, and the results showed that the improved AMFCC had a better recognition rate than LPCC and traditional MFCC. Kazi Fi [4] et al. used high-order spectrum to obtain phase entropy from musical instrument signals, and combining high-order spectral features with MFCC could improve classification accuracy. Cheng Long et [5] reasonable eigenvalues of chirp signal are extracted, this paper proposes a based on empirical mode decomposition (EMD) improved MFCC algorithm, through the study of the empirical mode decomposition of birdsong signal, FFT computation again, after the frequency synthesis by Mel filter, take its logarithm energy DCT transform, improved MFCC parameters, and then by using the gaussian mixture model (GMM) for the identification of birdsong. The experimental results show that the improved MFCC is superior to the traditional MFCC.
Due to the characteristics of folk music itself and the limitations of the human ear in auditory understanding, music is not speech, and it is very different from speech in signal characteristics and structure [6].Grosal Ariiit et al. [7] adopted MFCC as the classification feature, combined with support vector machine and RANSAC and Random Samples Consensus to classify the collected instrumental and vocal music genres, and the experimental results showed that RANSAC had a better effect on data processing with many changes. Elias Pam Palkout et al. [8] developed a music clustering system based on SOM(self organizing), which adopts the type judgment method of template matching and carries out music clustering by calculating the Euler distance between template vector and feature vector. In the process of MFCC, Fourier transform is used to reconstruct the frequency spectrum, so that this method has a certain anti-noise effect, but FFT lacks the function of frequency positioning and cannot highlight the dynamic instantaneity of speech [9].Chen Shu et al. [10] proposed an improved feature parameter extraction method based on cochlear cepstrum coefficient (MFCC), aiming at the problem that the speech recognition rate of Mayer frequency cepstrum coefficient (MFCC) parameter decreases in the noise environment. The improved linear discriminant analysis (LDA) algorithm is used to transform the extracted characteristic parameters to obtain more discriminative characteristic parameters and diagonalized covariance matrix to meet the requirements of HMM. Experimental results show that the proposed method can effectively improve the recognition rate and robustness of speech recognition system in noisy environment.
The Hilbert-Huang Transformm can handle double non-signal signals very well [11]. Norden E.Huang et al [12] proposed the Hilbert-Huang Transformm (HHT) method to deal with the unsatisfactory double non-signal processing method, and this method has relatively ideal results for double non-data analysis. Speech signals are decomposed by EMD, and the obtained IMFS is then extracted by MFCC, and the recognition effect is improved under the Berlin Emotional Language Database [13]. However, the IMF obtained by EMD decomposition has the phenomenon of mode mixing, resulting in the inaccurate IMF component [14].Mode aliasing is caused by the discontinuous characteristics of the signal, N.E.Huang et al.proposed Ensemble Empirical Mode Decomposition (EEMD), EEMD method can effectively overcome the modal aliasing existing in EMD [15 16].
Due to the characteristics of ethnic original ecological music and the limitation of human ear's understanding of hearing, this paper proposes the MFCC extraction algorithm based on HHT. Through Kmeans clustering analysis, the experimental results show that HHTMFCC parameters are better than MFCC characteristic parameters.

Cluster evaluation index
In this paper, HHT-MFCC of 13 categories of Yunnan ethnic original ecological music was evaluated by three clustering indexes, including Purity, Entropy and F-measure(Fβ). The clustering performance metric [17] is a kind of evaluation standard for clustering results. The corresponding range of Purity and F-measure indicators is [0 1]. The closer to 1, the better the clustering effect, and the smaller the Entropy index, the better the clustering result. is the probability that a member of cluster I belongs to class J, is the number of all the members in the cluster I, L is the number of classes, K is the number of clusters, M is the number of members involved in the whole cluster division.

HHT Hilbert-Huang Transform (HHT)
Hilbert transform (ht) refers to the complex envelope of a signal modulated by a real carrier. The Hilbert transform is used to describe the envelope of amplitude or phase modulation. The instantaneous frequency and instantaneous phase will make the analysis more simple. It has important theoretical significance and application value in communication system. hht is an adaptive timefrequency analysis method suitable for nonstationary nonlinearity. This method can reflect the dynamic instantaneous of the signal and capture the local change of the signal. The basis function [17] can be generated according to the characteristics of the signal itself. hht compared with ht, one more empirical mode decomposition. EMD is to decompose complex signals into several inherent modal functions from high frequency to low frequency (imf). Imf needs to meet two conditions: The number of polar points of the signal is equal to zeros or the difference is 1; The local mean of the upper envelope defined by the maximum and the lower envelope defined by the minimum is 0(that is, the upper and lower symmetry of the envelope).
EMD is the process of extracting the highest frequency component of the signal in each local in turn and obtaining a series of imf, so that each imf component can be hilbert transformed and the hilbert edge spectrum of each component can be obtained.

EEMD Ensemble empirical mode decomposition EEMD
The empirical mode decomposition (emd) method is huang proposed, As a new time-frequency analysis method, it is an adaptive time-frequency localization analysis method: imf is related to sampling frequency; It is based on changes in the data itself. emd is better than Fourier transform method, it gets rid of the limitation of Fourier transform. emd more important drawback is modal aliasing, which is eemd proposed by huang in order to better solve this problem [15].
EEMD algorithm steps: Step1: add normal white noise to the original signal . 8 Step2: signal with white noise as a whole, then decompose the emd to get imf components. Step3: repeat steps 1 and 2, add a new normally distributed white noise sequence each time. Among them, X(t) is the original signal ， X' (t) is a noisy signal ， Is the remainder component of the decomposition， represents each intrinsic modal function. EEMD the influence of signal extremum imf, if the distribution is not uniform, modal aliasing will occur. eemd the essence of the method is a multiple empirical mode decomposition of superimposed Gao Si white noise, using the statistical characteristics of Gao Si white noise with uniform frequency distribution, The extreme point characteristics of the signal are changed by adding different white noise of the same amplitude each time, and then the corresponding imf obtained by multiple emd are averaged to counteract the added white noise, thus effectively suppressing the generation of modal aliasing.

Implementation of MFCC algorithm based on HHT
The mfcc feature is a human sound perception model based on inner ear frequency analysis. mfcc set provides a perceptual, smooth estimation of speech spectrum over time. The fft transformation in mfcc feature extraction mfcc feature extraction is replaced by Hilbert yellow transform hht, because the double non-signal can be processed well. eemd the white noise is introduced into the signal to be analyzed, the spectrum of the white noise is uniformly distributed, and the white noise makes the signal automatically distributed to the appropriate reference scale. Because of the characteristic of zero mean noise, the noise will cancel each other after many average calculations, so the calculation result of integrated mean can be regarded as the final result directly. The difference between the calculated results of the integrated mean and the original signal decreases with the increase of the number of integrated mean. The obtained hilbert spectrum is transformed by mel filter banks, logarithm and dct to obtain hht-mfcc mixed characteristic parameters. The improved mfcc feature parameter extraction process based on eemd is shown in figure.2. The experimental steps of feature extraction for ethnic original ecological music audio are as follows: Step1: The pretreatment of each ethnic original music audio file; Step2: Add normally distributed white noise to the original signal; Step3 The signals added with white noise were taken as a whole, and then EEMD decomposition was carried out to obtain each IMF component; Step4: Repeat steps 2 and 3, adding a new normally distributed white noise sequence each time; Step5: The IMF obtained each time will be integrated and averaged as the final result; Step6: The marginal spectrum obtained in Step 5 goes through MEL filter bank, logarithmic operation, and finally DCT transformation is performed to obtain HHT-MFCC features. The experimental data in this paper are derived from tutor's troject team. A total of 40 WAV files of original ecological music of 13 categories of ethnic minorities in Yunnan, China, were obtained, including Bai, Yi, Bulang, Deang, Pumi, Dai, Dulong, Jingpo, Wa, Lisu, Naxi, Hani and Nu. Table 2 shows the total duration corresponding to each type of ethnic original ecological music.  The algorithm parameters of the experiment in this paper are shown in Table 4.

Experimental results
The experiment uses the HHT and MFCC fusion method to extract the features of these 13 categories of ethnic original music audio, and obtain the HHTMFCC features and the traditional MFCC features. Fig. 3 is the HHTMFCC feature diagram of ethnic original music audio, from which it can be seen that the cepstrum coefficients of each order are quite different.   The cepstrum mfcc in figure 4 is quite different from the cepstrum of the hht-mfcc, but the hhtmfcc has good reduction characteristics for the envelope change of the original signal.
This paper uses Kmeans clustering to compare and analyze the clustering effect of HHT-MFCC and MFCC through three indexes: Purity, Entropy, and F-measure ( ). The experimental results are shown in Table 5. The data in Table 5 are three index values obtained by clustering the MFCC and HHT-MFCC characteristics of 13 types of native music. The traditional MFCC and HHT-MFCC feature parameters are clustered by K clustering method, and the two types of features are clustered by evaluating the external clustering performance indicators Purity, Entropy, and F-measure.
From Table 5, it can be seen that the HHT-MFCC feature is better than the single MFCC feature in Purity, Entropy, and F-measure. It can be concluded that the HHTMFCC features are 0.49 higher in the Purity index than in the MFCC, 1.2 lower in the Entropy index, and 3% higher in the F index., where the purity index and the F indicator are better, and the smaller the Entropy indicator. In summary, the improved HHTMFCC features are superior to the traditional MFCC characteristics.

Conclusion
Ethnic original ecological music in Yunnan is a treasure of national culture and art, which is of great value to the classification and recognition of ethnic original ecological music. In view of the characteristics of original ecological music of 13 kinds of nationalities in Yunnan, this paper proposes a feature extraction method of MFCC based on HHT transform. K-means clustering is used to cluster the features extracted from 13 categories of ethnic original music audio, and the results show that the features of ethnic original music based on improved HHT are better than those of traditional MFCC. The next step can be classified and recognized according to the HHTMFCC characteristics of ethnic original ecological music.