Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Robust classification of heart valve sound based on adaptive EMD and feature fusion

  • Weibo Wang,

    Roles Conceptualization, Writing – original draft

    Affiliation Department of Electrical Engineering and Electronic Information, Xihua University, Chengdu, Sichuan, China

  • Jin Yuan,

    Roles Writing – original draft

    Affiliation Department of Electrical Engineering and Electronic Information, Xihua University, Chengdu, Sichuan, China

  • Bingrong Wang,

    Roles Data curation

    Affiliation Patent Examination Cooperation Sichuan Center of the Patent Office, CNIPA, Chengdu, Sichuan, China

  • Yu Fang,

    Roles Methodology

    Affiliation Department of Electrical Engineering and Electronic Information, Xihua University, Chengdu, Sichuan, China

  • Yongkang Zheng,

    Roles Methodology

    Affiliation State Grid Sichuan Electric Power Research Institute, Chengdu, Sichuan, China

  • Xingping Hu

    Roles Formal analysis

    632626965@qq.com

    Affiliation Sichuan Technology and Business University, Chengdu, Sichuan, China

Abstract

Cardiovascular disease (CVD) is considered one of the leading causes of death worldwide. In recent years, this research area has attracted researchers’ attention to investigate heart sounds to diagnose the disease. To effectively distinguish heart valve defects from normal heart sounds, adaptive empirical mode decomposition (EMD) and feature fusion techniques were used to analyze the classification of heart sounds. Based on the correlation coefficient and Root Mean Square Error (RMSE) method, adaptive EMD was proposed under the condition of screening the intrinsic mode function (IMF) components. Adaptive thresholds based on Hausdorff Distance were used to choose the IMF components used for reconstruction. The multidimensional features extracted from the reconstructed signal were ranked and selected. The features of waveform transformation, energy and heart sound signal can indicate the state of heart activity corresponding to various heart sounds. Here, a set of ordinary features were extracted from the time, frequency and nonlinear domains. To extract more compelling features and achieve better classification results, another four cardiac reserve time features were fused. The fusion features were sorted using six different feature selection algorithms. Three classifiers, random forest, decision tree, and K-nearest neighbor, were trained on open source and our databases. Compared to the previous work, our extensive experimental evaluations show that the proposed method can achieve the best results and have the highest accuracy of 99.3% (1.9% improvement in classification accuracy). The excellent results verified the robustness and effectiveness of the fusion features and proposed method.

Introduction

Heart sound is a weak biological signal produced by the systolic and diastolic motion of the human heart. It is also a vibration signal with nonlinear and non-stationary characteristics. Rheumatic heart disease refers to the effect of rheumatic fever on the heart valve, which results in heart valve disease. Its symptoms are stenosis or insufficiency of the mitral valve, tricuspid valve, and aortic valve. With the aging of the population, senile valvular disease, coronary heart disease, and valvular disease caused by myocardial infarction are becoming more and more common [1]. For a long time, the feature extraction of heart sound signals has been a research hotspot in the biomedical field. Existing research has put forward a variety of heart sound feature extraction from the perspectives of time domain, frequency domain, and time-frequency domain. Among them, wavelet analysis is widely used because of its excellent ability to represent local signal information in time and frequency domains. Used for time-frequency feature extraction of the heart sound signals, heart sound analysis based on S transform is an extension of wavelet transform and STFT, which overcomes the deficiencies of the latter two. However, most of these methods are based on the concept of linear time-varying or time-invariant. Because of the nonlinear and non-stationary characteristics of the heart sound signal, the linear analysis method is bound to ignore some important information inside the signals [2].

Empirical Mode Decomposition (EMD) is a new method for non-stationary processing signals proposed by Huang [3], a Chinese scientist of NASA in Hilbert-Huang Transform essential part. Zhang et al. used an empirical mode decomposition technique (EMD) to remove wall components from mixed signals [4]. The new method improves the performance of effectively and objectively removing wall components from composite signals. The time-frequency analysis method based on EMD is suitable for analyzing nonlinear and non-stationary signals and for the analysis of linear and stationary signals, which can adaptively decompose any signal into multiple intrinsic mode functions (IMF). Each IMF component contains the local characteristics of the original signal in different time scales. Analysis of IMF components can more accurately reflect the relevant information of the detailed characteristics of the original signal. Therefore, using EMD to decompose the complex heart sound signal and then extracting the characteristic information of the signal from the decomposed IMF components can reflect the intrinsic essence of the heart sound.

Feature extraction and analysis of heart sound signals is a significant part of establishing a cardiovascular disease diagnosis system [57]. Different features can reflect the state of heart function from various aspects. Therefore, statistical analysis of heart sound signals can determine the difference between heart valve defects and normal heart sounds, which can be used to discriminate different sound signals. Therefore, the corresponding ordinary feature sets are extracted from the time domain [8], frequency domain [9] and nonlinear space [10, 11]. In addition to extracting the above features, the characteristics of four cardiac reserve times (T1, T2, T11 and T12) were also integrated in this paper [12]. Feature selection methods in machine learning play an important role in biomedical data analysis [13]. Feature selection techniques can be roughly divided into three types: filter method, embedded method, and wrapper method [14]. Filter methods can be divided into two main groups, namely single feature evaluation and subset feature evaluation [15], regardless of classifier design. Wrappers and embedded methods interact with classifiers to achieve feature selection [16].

In recent years, more and more researchers have adopted DL and ML for classification studies [1720]. Commonly used classification algorithms contain algorithmic models such as artificial neural networks (ANN) [20], support vector machines (SVM), random forests (RF), maximum likelihood classifiers (MLC) [2124], decision trees [25], KNN [26], etc. Haq et al. have been more studied in this area, where SMOTEDNN [27], CDLSTM [28], DNNBoT [29], etc., have played a central role in classification research. Convolutional neural network (CNN), recurrent neural network (RNN) methods, and some conventional methods developed in the last five years have been used extensively in heart sound classification [30].

This paper proposes a feature reconstruction algorithm and feature fusion method based on adaptive EMD to classify heart valve diseases. It can effectively distinguish heart valve defects from normal heart sounds by selecting an essential subset of features. The main contributions are outlined as follows.

  1. This paper improves an adaptive reconstruction method based on Hausdorff distance. After EMD transformation, the Hausdorff distance (HD) between IMFs and the original heart sound signal was calculated. Then, according to the adaptive threshold based on Hausdorff distance, the appropriate IMF components were selected to reconstruct the heart sound signal. The proposed method has a better noise reduction effect, and the reconstructed heart sound signal has more obvious feature information.
  2. A feature fusion method is proposed, which extracts not only the time domain, frequency domain, and nonlinear features but also fuses four cardiac reserve times features. The proposed feature fusion method can improve the effectiveness and accuracy of heart sound classification.

Algorithm design of preprocessing

Fig 1 shows the overall method block diagram, which contains four main parts: preprocessing of original heart sound, feature extraction, feature screening, and classification. Heart sound is a kind of weak physiological signal, which will inevitably produce large or small noise due to various types of interference in the acquisition process. Such noise may cover up the original characteristics of the signal and affect the subsequent analysis of different heart sound signal types. Noise mainly includes environmental noise, power frequency noise, friction sound of skin contacts and collection equipment, the instrument’s interference, etc. Therefore, to maximize the retention of valuable signals, it is necessary to preprocess the original signal.

A. EMD

The Hilbert-Huang transform includes Huang transform and Hilbert spectrum analysis. Huang transform is also called Empirical Mode Decomposition (EMD) [10, 11]. EMD, as a nonlinear and non-stationary signal analysis method, can decompose the heart sound signal into several intrinsic mode functions, and each IMF component carries local characteristics corresponding to the signal. Compared with wavelet transform, EMD decomposes the signal by selecting the wavelet function in advance and adaptively decomposes each IMF component according to frequency from high to low. Each IMFs contains the local characteristics of different time scales of the original signal. The IMF component must satisfy two restrictions: In the whole sequence, the number of extreme points a and zero-crossing points b meet the condition of |a-b| ≤ 1; For any point, the mean value of the upper and lower envelopes composed of extreme local issues must be 0.

For a time-series signal s(t), the principle of EMD is as follows [31]:

  1. Determine the local maximum and minimum points of the original heart sound signal s(t), and fit the upper and lower envelopes e1(t) and e2(t), as shown in the yellow and green curves in Fig 2.
  2. Obtain the mean curve m(t) of e1(t) and e2(t) as shown in Formula (1), as shown in the red curve in Fig 2. (1)
    Calculate the mean curve m(t) of the envelopes, and uses(t) minus m(t) to get: (2)
    If h1(t) does not satisfy any of the above conditions of IMF, then regard h1(t) as s(t), and repeat the above steps until the component hk(t) that satisfies the restrictions is obtained, and record it as the first IMF component c1(t).
  3. Calculate the component r1(t) = s(t)-c1(t), use r1(t) as the original signal, and repeat the above steps until the end of the decomposition, and finally obtain an IMF component and a remainder r(t). So far, the original signal can also be expressed as: (3)

Fig 3 shows the IMF component signal’s time and frequency domain waveforms obtained from an abnormal heart sound valve defect signal (Aortic stenosis, AS) after EMD. The number of IMF components obtained by decomposition from different signals is different. The figure only shows the first ten components (c1-c10) and the remainder r of the empirical mode decomposition. Observing from Fig 3(B), after EMD processing, the original heart sound signal containing multiple frequency bands is decomposed into multi-layer IMF components in ascending order of frequency. The frequency areas of the heart sound signal are mainly concentrated in the low-frequency part.

thumbnail
Fig 3. The EMD decomposition of heart sound.

(a) Time domain; (b) Frequency domain.

https://doi.org/10.1371/journal.pone.0276264.g003

B. IMF selection and reconstruction

Since the heart sound signal collection will inevitably generate noise due to various interferences, the original characteristics of the heart sound signal will be masked by the noise. It can be seen from Fig 3 that the essential information of the original signal is often concentrated in a few IMF components. And the interference caused by noise to the signal characteristics can be effectively reduced by screening suitable IMF component signals. We choose two evaluation indicators: Correlation Coefficient [32, 33], and Root Mean Square Error [34].

1) Correlation and root mean squared error.

The correlation coefficient formula based on EMD is as follows: (4)

In Formula (4), corri is the correlation coefficient between the original heart sound signal s(j) and the IMF component signal, j is the j-th sample of the signal, and ci(j) is the j-th of the i-th IMF component obtained by EMD. Where: i = 1, 2,…, L. The larger the corr, the higher the correlation.

The RMSE formula based on EMD is as follows: (5) In Formula (5), RMSEi is the Root Mean Square Error between the original heart sound signal s(j) and the IMF component signal, often used as an error measurement. The smaller the RMSE value, the higher the closeness to the original signal.

According to the Corri and RMSEi of the component signal and the original signal, the threshold λ and δ are calculated by the Formula (6). The judgment condition Formula (7) is used for adaptive threshold judgment (the correlation coefficient of the IMF component is greater than or equal to λ, and the Root Mean Square Error is smaller than δ) to filter out practical IMF components. Because the L obtained by decomposition of each signal is different, the threshold values λ and δ are different, and the number of IMF components screened is also different.

(6)(7)

Since each IMF component signal carries noise, the noise is not correlated with the heart sound signal and has a significant deviation. This article uses an adaptive idea to select the excellent IMF components and reconstruct them to form a sub-signal for subsequent feature extraction.

2) Algorithm flow.

Fig 4 shows the adaptively reconstruction. Firstly, the Corr and RMSE between each IMFs component signal and the original heart sound signal are calculated. The best component signal is selected according to the adaptive threshold selection rule. Finally, the heart sound sub-signal is reconstructed. The signal can more intensively characterize the adequate information of the original heart sound and prepare for the extraction of various characteristic parameters of the heart sound. Fig 6(A) and 6(B) respectively show the AS, MS, MR, MVP, and NHS heart sound signals, as well as the correlation coefficient size and RMSE of the first 9-layer components after ASD, VSD, TOF, and NHS heart sounds, are decomposed. Among them, the dotted lines represent the adaptive thresholds λ and δ. The IMF components lower than λ and higher than δ are the original signal interference components, represented by black bars. The IMF components higher than λ and lower than δ are the most robust signal correlated features, represented by yellow and blue bars, respectively.

thumbnail
Fig 4. The flowchart based on EMD with adaptive reconstruction.

https://doi.org/10.1371/journal.pone.0276264.g004

C. Improved adaptive reconstruction based on Hausdorff distance

Based on the above adaptive reconstruction IMF method, this paper also finds that using the first seven layers of IMF components can optimize the results. So, calculate the Hausdorff Distance (HD) value of each IMF and the original signal, and then adaptively select the appropriate IMF component reconstruction [35].

1) Hausdorff distance.

The Hausdorff Distance (HD) is the Max-Min distance between two geometric objects to measure their degree of resemblance [8]. Given two nonempty points sets A = {a1, a2,, am} and B = {b1, b2,, bn}, the HD between A and B is formulated as (8) where we have: (9) (10) where ||*|| is a norm distance metric, called the Euclidean distance. H(A,B) denotes the (bid-directional) HD in A with B, which is a fundamental norm of the HD, h(A, B) is called the directed HD from A to B, and h(B, A) is called the directed HD from B to A.

2) Proposed algorithm flow.

The proposed algorithm flow chart is as follows.

As shown in Fig 5, this paper improves the method of Hausdorff adaptive threshold to IMF and further selects more appropriate IMF components, as shown in the following

thumbnail
Fig 5. Heart sound flowchart based on EMD adaptive HD reconstruction.

https://doi.org/10.1371/journal.pone.0276264.g005

(11) where L is the number of IMF components.

Fig 5 shows the improved adaptive threshold selection method based on HD. First, after obtaining all the IMF components after EMD, calculate the HD between each layer and the original signal, and then sort the IMF components with the HD value in the first seven layers. Therefore, the appropriate IMF component can be adaptively selected by calculating the mean of the IMF components’ HD from the first seven layers. Finally, the heart sound sub-signal is reconstructed.

Figs 6 and 7 show the comparison results using the two IMF components selection method. In Fig 6(A), according to Formula (7), Corr&RMSE threshold selection method is applied to the abnormal heart sound AS to select 2th-5th layers of IMF to reconstruct the signal. In Fig 7(A), according to Fig 5, the AS’ IMF components 2th-5th layers smaller than the threshold of adaptive Hausdorff Distance ε are selected.

thumbnail
Fig 6. Correlation and RMSE of IMFs of heart sound.

(a) Correlation and RMSE of IMF of AS, MS, MR, MVP; (b) Correlation and RMSE of IMFs of ASD, VSD, TOF&NHS.

https://doi.org/10.1371/journal.pone.0276264.g006

thumbnail
Fig 7. Adaptive selection based on the mean of the first 7 layers of HD.

(a) Adptive Hausdorff Distance of IMFs of AS, MS, MR, MVP; (b) Adptive Hausdorff Distance of IMFs of ASD, VSD, TOF&NHS.

https://doi.org/10.1371/journal.pone.0276264.g007

Fig 8 is a time-frequency waveform of the heart sound signal before and after the EMD adaptive filtering and reconstruction. It can be seen from the red box marked that the noise in the high-frequency part of the heart sound is effectively suppressed due to the reconstruction. Therefore, the reconstruction operation can reduce the noise of the heart sound signal effectively, thus can successfully select heart sound sub-signals that are more similar to the original heart sound.

Feature fusion and selection

A. Feature fusion

Different features can reflect the state of heart function from various aspects. Therefore, statistical analysis of heart sound signals can obtain the difference between heart valve defect sounds and standard heart sound signals, which can be used to discriminate different heart sound signals. Existing studies have shown that the waveform transformation characteristics, energy characteristics and complexity characteristics of heart sound signals can reflect the corresponding cardiac activity state of different heart sound samples. Therefore, this section extracts complementary feature sets from the time domain, frequency domain and nonlinear domain and combines four cardiac reserve time features. Then the analysis can be used for subsequent feature screening and classification recognition.

1) Time domain features.

This paper is from the perspectives of mathematical statistics, vibration signal feature analysis, and heart sound time-domain waveforms. There are 25 kinds of time-domain features, including Mean, Mean Square, Maximum, Minimum, Variance, Standard Deviation, Root Mean Square (RMS), Peak-to-Peak Value, Root-Square Amplitude, Skewness Factor, Kurtosis Factor, Form Factor, Crest Factor, Impulse Factor, Margin Degree Factor, Hjorth Parameter (Mobility), Hjorth Parameter (Complexity), First Quartile, Second Quartile, Third Quartile, Interquartile Range.

2) Frequency domain features.

In addition to time-domain signal analysis, the Fourier transform can also be used to observe the frequency domain characteristics of signals that cannot be obtained in the time domain.

This paper uses analysis methods such as DFT and Hilbert transform to extract 11 features from the frequency perspective, including Energy Entropy, Shannon Entropy, Mean Frequency, Center of Gravity Frequency, Root Mean Square Frequency, and Frequency Deviation, Instantaneous Energy Median, Average Instantaneous Energy, Median Instantaneous Frequency, Average Instantaneous Frequency.

3) Nonlinear domain features.

This kind of information describes the state of movement and the existence of objective things. Information entropy illustrates the complexity of data from the perspective of information theory. It also expands and analyzes from the standpoint of nonlinearity to obtain corresponding load-type indicators that describe signal data. From the perspective of nonlinearity, the corresponding load-type index describing the signal data can be obtained. There are four kinds of nonlinear characteristics: approximate entropy [36], Sample Entropy, Multi-Scale Permutation Entropy, and Exponential Entropy.

As shown in Table 1 below, the feature set extracted in our paper includes the above 25 types of time-domain features, 11 types of frequency-domain features, and four types of nonlinear characteristics. After feature extraction, 40 types of features are labelled to facilitate subsequent feature screening experiments.

4) Cardiac reserve time features.

According to the established single-degree-of-freedom vibration model, auscultation of heart sound is compared to the duration of low-frequency sound or sound pressure captured by the tympanic membrane to extract the corresponding time-limited features. T1 is the time limit of S1, T2 is the time limit of S2, T12 is the time limit of S1 to S2 in the same cardiac cycle, and T11 is the time limit of one cardiac cycle, indicating the time interval from S1 to the start of the next adjacent cardiac cycle S1.

B. Feature selection

The multiple features may include related, irrelevant, and redundant features. Therefore, selecting features that are beneficial to learning classification from all features is necessary. Feature selection methods can be divided into three types:

  • Filter: Filtering method, scoring each feature according to divergence or correlation, setting threshold or the number of thresholds to be selected, and selecting features.
  • Wrapper: according to the objective function (usually the prediction effect score), select several features at a time, or exclude several features.
  • Embedded: The embedding method first uses machine learning algorithms and models for training, gets each feature’s weight coefficient and selects the feature from large to small according to the coefficient. Similar to the Filter method, but through training to determine the pros and cons of features.

The significant goals of feature selection are increasing the accuracy, finding the minimal effective feature subset, and increasing the performance of evaluations. So, this paper selects six feature screening and sorting algorithms [37], among which the filter methods include mRMR, KCCAmRMR, MIC, and QPFS, the wrapper method is RFECV, and the embedded method has a tree-based model.

1) mRMR.

The Minimum Redundancy-Maximum Relevance (mRMR) algorithm is a filtering feature selection method [38, 39]. This method can balance correlation and redundancy in different ways and uses mutual information as a calculation criterion to measure the redundancy between features and the relationship between characteristics and class variables. The correlation between features is selected by maximizing the correlation between characteristics and class variables and minimizing the redundancy between features.

2) KCCAmRMR.

An improved algorithm, called Kernel Canonical Correlation Analysis based on mRMR (KCCAmRMR) [40], was also developed, in which irrelevant redundancy is filtered out by using an additional kernel canonical correlation analysis. Thus, only the relevant redundancy is considered in the subsequent mRMR procedure. The feature selection criterion of our KCCAmRMR method has two terms as in mRMR: relevance and redundancy.

3) QPFS.

Quadratic programming feature selection (QPFS) is a feature ranking algorithm that uses the information theory as the similarity measure [41]. Also, it applies an optimization solution to estimate the quality of a given dataset’s features. The QPFS assigns a weight to each feature such that the more critical features will have more significant values. As a final result, the features are sorted based on decreasing consequences. Then by applying a threshold value, the top features will be selected as the last selected features.

4) MIC.

MIC can quantify the correlation between continuous and qualitative variables and calculate the correlation between a constant feature and qualitative target variables [42, 43]. The larger the MIC value, the stronger the recognition ability of the corresponding element. This algorithm calculates the correlation between each dimensional feature and the heart sound sample label, and essential features can be selected.

5) Tree-based model.

Tree-based prediction model can be used to calculate the importance of features, so they can be used to remove irrelevant features.

6) RFECV.

The RFECV method is divided into two parts [44]. Recursive feature elimination is used to evaluate the importance of features. The other is Cross-Validation (CV), which selects the best number of segments through CV after feature evaluation. Feature.

Experimental results and analysis

A. Data source

The heart sound data used in our experiment is from open source ([Online] Available: https://github.com/yaseen21khan/) [25] and data collected in our laboratory. Heart sound signals from heart valve defects were collected clinically. All recordings have been resampled to 2205Hz. The open source samples include Aortic stenosis (AS), Mitral regurgitation (MR), Mitral stenosis (MS), Mitral valve prolapse (MVP), and normal heart sounds (NHS), a total of 1000 samples. The laboratory data consists of 412 samples including ASD, VSD, TOF, and NHS. Tables 2 and 3 show the data source. The dataset was split into training data (80%) and testing data (20%).

B. Feature screening model comparison

This paper extracts 40 features from the heart sound reconstructed by adaptive EMD using fusion features from time, frequency, nonlinear domain and cardiac reserve time features. Each type of heart sound signal contains a 40 fusion feature data set. The open source dataset includes five types of heart sounds. To validate the proposed feature fusion technique under various EMD methods, feature sets with or without the four cardiac reserve time features were tested. Six feature screening methods are used to rank the particular importance. This performs feature reduction to prepare for the subsequent input of the classifier.

Similarly, the 412 samples collected in the laboratory were subjected to the same experiment to build the 412×40 feature set. After removing the four cardiac reserve time features, the size of the remain feature set is 412×36.

C. Classification accuracy

From the heart sound signal in the open source database and our lab, 36/40 features ranked by mRMR, KCCAmRMR, QPFS, MIC, Tree, and RFECV were incrementally fed into the RF classifier [45, 46]. For example, the curves of the average classification accuracies versus the number of top-ranking features with open source datasets and our lab’s datasets based on 10-fold cross-validation are shown in Figs 9(A)–9(F) and 10(A)–10(F).

thumbnail
Fig 9. The average classification accuracy versus the number of top-ranking features selected by mRMR, KCCAmRMR, QPFS, MIC and Tree and RFECV using RF classifier from open source data.

https://doi.org/10.1371/journal.pone.0276264.g009

thumbnail
Fig 10. The average classification accuracy versus the number of top-ranking features selected by mRMR, KCCAmRMR, QPFS, MIC and Tree and RFECV using RF classifier from our Lab data.

https://doi.org/10.1371/journal.pone.0276264.g010

In Figs 9 and 10, each graph is composed of six polylines, representing a signal composition method. The composition method is as follows:

  1. The original heart sound is directly extracted without the EMD reconstruction process.
  2. Use Corr & RMSE to select EMD reconstructed IMF components and then extract 36 features.
  3. Use the original HD to select the EMD reconstructed IMF component and perform 36 feature extraction.
  4. Use Corr & RMSE to select EMD reconstructed IMF components and extract 40 features.
  5. After EMD, the HD threshold of the first seven layers of IMF components is adaptively selected and reconstructed, and then extract 40 features.
  6. Use the original HD to select and reconstruct the IMF components. Then, 36 feature extractions are performed. The six feature screening and sorting algorithms are used. Finally, input into the classifier.

The comparison of the six feature selection methods used in the classification experiment verifies the data mining effect of the feature data set after combining the six feature extraction data. The main idea in this experimental part is: First, select heart sound preprocessing and feature reconstruction by different methods. Secondly, 36 features and 40 features are used to compare the data set in the absence of 4 heart sound signal-specific cardiac reserve time features under the six feature screening algorithms of mRMR, KCCAmRMR, QPFS, MIC, tree-based model, and RFECV. Multi-dimensional elements are sorted to obtain a data set in which the information features of each data set are sorted from high to low. Finally, the highest-ranked feature is gradually input into the classifier to obtain the classification results.

Fig 9 shows the average classification accuracy of open source data. Under different feature selection algorithms and preprocessing methods, the corresponding average classification accuracy rises with the increase of feature dimension. After ten dimensions, the classification accuracy of each method flattens out. It can be seen from the Fig that adaptive HD performs well in all six feature screening methods.

Fig 10 shows the average classification accuracy of our laboratory data. The collected data has more noise interference than the open source data. It can be observed that, among the six signal reconstruction methods, adaptive HD has obviously higher classification accuracy than other reconstruction methods among all six feature selection methods. By comparing the blue curve and the black curve, it can find that the peak of the average accuracy curve from the extracted 36 features (without T1, T2, T11, and T12 cardiac reserve time features) is lower than that of the extracted 40 features (including T1, T2, T11, and T12). Therefore, the results of the feature fusion are better than that of the ordinary features.

Tables 49 compare the six different methods with six feature selection algorithms on the open-source and our laboratory datasets. The feature fusion technique is also tested by comparing 36 features (without T1, T2, T11, and T12 cardiac reserve time features) and 40 features (including T1, T2, T11, and T12). As shown in Tables 49, the method, No EMD, gets the worst results. And our proposed method, EMD + Adaptive HD, with 40 fusion features, can achieve the best and most robust results on the two datasets. For the open source database, the feature selection method Tree with the proposed adaptive HD can obtain the best results. The classification accuracy reached 99.3% in the open source database, and the selected features were reduced from 40 to 14 in Table 8. In the laboratory dataset, the classification accuracy reached 76.21%, and the selected features decreased from 40 to 12 in Table 6.

thumbnail
Table 4. Comparison of accuracy and feature dimension under different methods based on mRMR.

https://doi.org/10.1371/journal.pone.0276264.t004

thumbnail
Table 5. Comparison of accuracy and feature dimension under different methods based on KCCAmRMR.

https://doi.org/10.1371/journal.pone.0276264.t005

thumbnail
Table 6. Comparison of accuracy and feature dimension under different methods based on QPFS.

https://doi.org/10.1371/journal.pone.0276264.t006

thumbnail
Table 7. Comparison of accuracy and feature dimension under different methods based on MIC.

https://doi.org/10.1371/journal.pone.0276264.t007

thumbnail
Table 8. Comparison of accuracy and feature dimension under different methods based on tree.

https://doi.org/10.1371/journal.pone.0276264.t008

thumbnail
Table 9. Comparison of accuracy and feature dimension under different methods based on RFECV.

https://doi.org/10.1371/journal.pone.0276264.t009

The computer used to run the proposed method is Legion R9000P2021H, and the processor is AMD Ryzen 7. The operating system is Windows 10. The time consumption of classification of heart valve sound for every sample in the open source dataset using EMD+Adaptive HD is about 10s.

Table 10 shows the classification results using different classifiers (RF, KNN, DT) on the open-source and our laboratory databases. Forty fusion features are extracted for each dataset. It can be seen that RF can achieve the best results in both datasets. The heart sound classification accuracy on the open source database reaches 99.3% when the 14 features are selected. The heart sound classification accuracy on the laboratory database was 77.18% when the 12 features were selected.

thumbnail
Table 10. The best accuracy and number of selected minimum features by adaptive HD method.

https://doi.org/10.1371/journal.pone.0276264.t010

Table 11 shows the comparison with previous research on the open source dataset Yaseen. As can be seen from Table 11 that Yaseen got an accuracy of 97.9% by SVM in 2018 [25]. In 2020, Baghel achieved an accuracy of 98.6% using CNN [47]. In 2020, Oh attained an accuracy of 98.2% by WaveNet [48]. Our proposed method can achieve the best accuracy compared to the above algorithms.

Conclusion

The heart sounds with valve defects were effectively distinguished from the normal heart sounds. This paper used adaptive Empirical Mode Decomposition (EMD) and feature fusion strategy to classify heart sounds. Several feature selection methods and classifiers were chosen to compare. Experimental tests in two databases validated the effectiveness of the proposed IMF reconstruction method with the adaptive Hausdorff Distance thresholds. Our proposed feature fusion technique with 40 features, including ordinary features and cardiac reserve time features, can achieve robust and excellent results. The experimental results show our proposed methods, adaptive EMD and feature fusion, are of great value to further realize the clinical auxiliary diagnosis of heart disease.

Although there is a good performance on the public dataset, the number and type of samples collected in our laboratory are insufficient. To achieve better experimental results, more samples need to be collected to verify the robustness and effectiveness of our proposed algorithm.

Some areas can still be optimized and improved to be studied in the future, such as obtaining more heart disease datasets, using ML and AI techniques to analyze heart sounds and improving the accuracy of heart sound classification, reducing the time-consuming cost of the algorithm.

References

  1. 1. Hu SH, Gao R, Liu L, Zhu M, Wang W, Wang Y, et al. Summary of the 2018 Report on Cardiovascular Diseases in China. Chinese Circulation 34:209–20.
  2. 2. Li BB, Yuan ZF. Non-linear and chaos characteristics of heart sound time series. Proceedings of the Institution of Mechanical Engineers Part H-Journal of Engineering in Medicine. 2008;222(H3):265–72. WOS:000255543400002. pmid:18491696
  3. 3. Huang W, Shen Z, Huang NE, Fung YC. Engineering analysis of biological variables: an example of blood pressure over 1 day. Proceedings of the National Academy of Sciences of the United States of America. 1998;95(9):4816–21. pmid:9560185
  4. 4. Zhang YF, Gao Y, Wang L, Chen JH, Shi XL. The removal of wall components in Doppler ultrasound signals by using the empirical mode decomposition algorithm. Ieee Transactions on Biomedical Engineering. 2007;54(9):1631–42. WOS:000249067900010. pmid:17867355
  5. 5. Mathew B, Francis L, Kayalar A, Cone J. Obesity: effects on cardiovascular disease and its diagnosis. Journal of the American Board of Family Medicine. 2008;21(6):562–8. WOS:000260721500013. pmid:18988724
  6. 6. Al-Utaibi KA, Abdulhussain SH, Mahmmod BM, Naser MA, Alsabah M, Sait SM. Reliable recurrence algorithm for high-order krawtchouk polynomials. Entropy. 2021;23(9). WOS:000700628700001. pmid:34573787
  7. 7. Mahmmod BM, Abdulhussain SH, Suk T, Hussain A. Fast computation of hahn polynomials for high order moments. IEEE Access. 2022;10:48719–32. WOS:000795507300001.
  8. 8. Wang M, Chen JY, Gao F, Zhao JY. Space moving target detection using time domain feature. Optoelectronics Letters. 2018;14(1):67–70. WOS:000429130100015.
  9. 9. Wang K, Yong B, Xue ZH. Frequency Domain-based features for hyperspectral image classification. IEEE Geoscience and Remote Sensing Letters. 2019;16(9):1417–21. WOS:000484210100016.
  10. 10. Ding Y, Li N, Zhao Y, Huang K. Image quality assessment method based on nonlinear feature extraction in kernel space. Frontiers of Information Technology & Electronic Engineering. 2016;17(10):1008–17. WOS:000386352200004.
  11. 11. Chen QH, Li LQ, Qian T. Time-frequency transform involving nonlinear modulation and frequency-varying dilation. Complex Variables and Elliptic Equations. 2020;65(11):1800–13. WOS:000488128200001.
  12. 12. Jiang ZW, Choi SJ. A cardiac sound characteristic waveform method for in-home heart disorder monitoring with electric stethoscope. Expert Systems with Applications. 2006;31(2):286–98. WOS:000237645100009.
  13. 13. Yvan S, Iñaki I, Pedro L. A review of feature selection techniques in bioinformatics. Bioinformatics. 2007;(19):2507–17. pmid:17720704
  14. 14. Rodriguez-Galiano VF, Luque-Espinar JA, Chica-Olmo M, Mendes MP. Feature selection approaches for predictive modelling of groundwater nitrate pollution: an evaluation of filters, embedded and wrapper methods. Science of the Total Environment. 2018;624:661–72. WOS:000426355900066. pmid:29272835
  15. 15. Alirezanejad M, Enayatifar R, Motameni H, Nematzadeh H. Heuristic filter feature selection methods for medical datasets. Genomics. 2020;112(2):1173–81. WOS:000514071600014. pmid:31276753
  16. 16. Liu C, Wang WY, Zhao Q, Shen XM, Konan M. A new feature selection method based on a validity index of feature subset. Pattern Recognition Letters. 2017;92:1–8. WOS:000403119400001.
  17. 17. Haq MA. Planetscope nanosatellites image classification using machine learning. Computer Systems Science and Engineering. 2022;42(3):1031–46. WOS:000754049700012.
  18. 18. Haq MA, Baral P. Study of permafrost distribution in Sikkim Himalayas using Sentinel-2 satellite images and logistic regression modelling. Geomorphology. 2019;333:123–36. WOS:000464301800009.
  19. 19. Haq MA. CNN based automated weed detection system using UAV imagery. Computer Systems Science and Engineering. 2022;42(2):837–49. WOS:000740935600004.
  20. 20. Haq MA, Azam MF, Vincent C. Efficiency of artificial neural networks for glacier ice-thickness estimation: a case study in western Himalaya, India. Journal of Glaciology. 2021;67(264):671–84. WOS:000670030500008.
  21. 21. Waske B, Benediktsson JA, Arnason K, Sveinsson JR. Mapping of hyperspectral AVIRIS data using machine-learning algorithms. Canadian Journal of Remote Sensing. 2009;35:S106–S16. WOS:000275720100008.
  22. 22. Krishna G, Sahoo R, Pradhan S, Ahmad T, Sahoo P. Hyperspectral satellite data analysis for pure pixels extraction and evaluation of advanced classifier algorithms for LULC classification. Earth Science Informatics. 2018;11(2):159–70. WOS:000431896100001.
  23. 23. Gore A, Mani S, HariRam RP, Shekhar C, Ganju A. Glacier surface characteristics derivation and monitoring using Hyperspectral datasets: a case study of Gepang Gath glacier, Western Himalaya. Geocarto International. 2019;34(1):23–42. WOS:000463832400002.
  24. 24. Shafri H, Affendi S, Shattri MJJoCS. The performance of maximum likelihood, spectral angle mapper, neural network and decision tree classifiers in hyperspectral image analysis. Journal of Computer Science, 3(6), 419–423.
  25. 25. Yaseen , Son GY, Kwon S. Classification of heart sound signal using multiple features. Applied Sciences. 2018; 8(12): 2344.
  26. 26. Yang C, Li Y, Zhang C, Hu Y, editors. A Fast KNN algorithm based on simulated annealing. International Conference on Data Mining; 2008.
  27. 27. Haq MA. SMOTEDNN: A novel model for air pollution forecasting and AQI classification. Cmc-Computers Materials & Continua. 2022;71(1):1403–25. WOS:000717553300036.
  28. 28. Haq MA. CDLSTM: A novel model for climate change forecasting. Cmc-Computers Materials & Continua. 2022;71(2):2363–81. WOS:000729728900017.
  29. 29. Haq MA, Khan MAR. DNNBoT: Deep neural network-based botnet detection and classification. Cmc-Computers Materials & Continua. 2022;71(1):1729–50. WOS:000717623300012.
  30. 30. Chen W, Sun Q, Chen XM, Xie GC, Wu HQ, Xu C. Deep learning methods for heart sounds classification: a systematic review. Entropy. 2021;23(6). WOS:000665391200001. pmid:34073201
  31. 31. Wu ZH, Huang NE, Long SR, Peng CK. On the trend, detrending, and variability of nonlinear and nonstationary time series. Proceedings of the National Academy of Sciences of the United States of America. 2007;104(38):14889–94. WOS:000249715100005. pmid:17846430
  32. 32. Rios RA, de Mello RF. Applying empirical mode decomposition and mutual information to separate stochastic and deterministic influences embedded in signals. Signal Processing. 2016;118:159–76. WOS:000362156000015.
  33. 33. Kushal TRB, Illindala MS. Correlation-based feature selection for resilience analysis of MVDC shipboard power system. International Journal of Electrical Power & Energy Systems. 2020;117. WOS:000510527500093.
  34. 34. Chai T, Draxler RR. Root mean square error (RMSE) or mean absolute error (MAE)?—Arguments against avoiding RMSE in the literature. Geoscientific Model Development. 2014;7(3):1247–50. WOS:000341600100030.
  35. 35. Sim DG, Kwon OK, Park RH. Object matching algorithms using robust Hausdorff distance measures. IEEE Transactions on Image Processing: A Publication of the IEEE Signal Processing Society. 1999;8(3):425–9. pmid:18262885.
  36. 36. Wang J, Shi M, Zhang X. Arrhythmia classification algorithm based on EMD and ApEn feature extraction. Chinese Journal of Scientific Instrument. 2016; 37: 168–173.
  37. 37. Lyu HQ, Wan MX, Han JQ, Liu RL, Wang C. A filter feature selection method based on the maximal information coefficient and Gram-Schmidt orthogonalization for biomedical data mining. Computers in Biology and Medicine. 2017;89:264–74. WOS:000413376600027. pmid:28850898
  38. 38. Peng HC, Long FH, Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2005;27(8):1226–38. WOS:000229700900004. pmid:16119262
  39. 39. Billah M, Waheed S. Minimum redundancy maximum relevance (mRMR) based feature selection from endoscopic images for automatic gastrointestinal polyp detection. Multimedia Tools and Applications. 2020;79(33–34):23633–43. WOS:000539889900002.
  40. 40. Sakar CO, Kursun O, Gurgen F. A feature selection method based on kernel canonical correlation analysis and the minimum Redundancy-Maximum Relevance filter method. Expert Systems with Applications. 2012;39(3):3432–7. WOS:000297823300126.
  41. 41. Rodriguez-Lujan I, Huerta R, Elkan C, Cruz CS. Quadratic programming feature selection. Journal of Machine Learning Research. 2010;11:1491–516. WOS:000282521500008.
  42. 42. Zhang YH, Zhang WS, Xie Y. Improved heuristic equivalent search algorithm based on Maximal Information coefficient for bayesian network structure learning. Neurocomputing. 2013;117:186–95. WOS:000321408200020.
  43. 43. Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, et al. Detecting novel associations in large data sets. Science. 2011;334(6062):1518–24. WOS:000298091400044. pmid:22174245
  44. 44. Dangare CS, Apte SSJIJoCA. Improved Study of Heart Disease Prediction System using Data Mining Classification Techniques. 2012;47(10):44–8.
  45. 45. Xiao M, Yan H, Song JZ, Yang YZ, Yang XL. Sleep stages classification based on heart rate variability and random forest. Biomedical Signal Processing and Control. 2013;8(6):624–33. WOS:000329885000017.
  46. 46. Yucelbas S, Yucelbas C, Tezel G, Ozsen S, Yosunkaya S. Automatic sleep staging based on SVD, VMD, HHT and morphological features of single-lead ECG signal. Expert Systems with Applications. 2018;102:193–206. WOS:000430774900015.
  47. 47. Baghel N, Dutta MK, Burget R. Automatic diagnosis of multiple cardiac diseases from PCG signals using convolutional neural network. Comput Methods Programs Biomed. 2020;197:105750. Epub 2020/09/16. pmid:32932128.
  48. 48. Oh SL, Jahmunah V, Ooi CP, Tan RS, Ciaccio EJ, Yamakawa T, et al. Classification of heart sound signals using a novel deep WaveNet model. Comput Methods Programs Biomed. 2020;196:105604. Epub 2020/06/28. pmid:32593061.