An automatic segmentation method for heart sounds

There are two major challenges in automated heart sound analysis: segmentation and classification. An efficient segmentation is capable of providing valuable diagnostic information of patients. In addition, it is crucial for some feature-extraction based classification methods. Therefore, the segmentation of heart sound is of significant value. This paper presents an automatic heart sound segmentation method that combines the time-domain analysis, frequency-domain analysis and time–frequency-domain analysis. Employing this method, the boundaries of heart sound components are first located, and the components are then recognized. Finally, the heart sounds are divided into several segments on the basis of the results of boundary localization and component identification. In order to evaluate the performance of the proposed method, quantitative experiments are performed on an authoritative heart sound database. The experimental results show that the boundary localization has a sensitivity (Se) of 100%, a positive predictive value (PPV) of 99.3% and an accuracy (Acc) of 99.93%. Moreover, the Se, PPV and Acc of component identification reach 98.63, 99.86 and 98.49%, respectively. The proposed method shows reliable performance on the segmentation of heart sounds. Compared with previous works, this method can be applied to not only normal heart sounds, but also the sounds with S3, S4 and murmurs, thus greatly increasing the applied range.

learn about patients' heart health. However, this conventional diagnostic approach is highly subjective and relies largely on physicians' experience [3]. Thus, an accurate and efficient automated heart sound analysis system is required.
There are two major challenges in developing an automated heart sound analysis tool: segmentation and classification. The primary operations of segmentation are positioning the boundaries of heart sound components and identifying the component types, namely S1, S2, S3, and S4. Several segmentation methods have been reported. However, most of the methods focus on either boundary detection or component identification. In addition, many methods only apply on normal heart sounds, thus severely limiting the application of these methods.
Liang et al. [4] proposed a Shannon-entropy-based heart sound segmentation method to recognize S1 and S2, but they did not study S3, S4, and murmurs. Kumar et al. [5] presented an S3 detection algorithm based on wavelet transform; however, this method could not detect the boundaries of the heart sound components. Moukadem et al. [6] developed an S-transform-based heart sound segmentation method; however, this method can only be applied to normal heart sounds. Springer et al. [7] successfully addressed the segmentation problem of noisy S1 and S2 by utilizing a logistic regression-hidden semi-Markov model. Nevertheless, this study, as most other studies, did not address the segmentation of heart sounds containing S3 and S4. In addition to the aforementioned studies, several other researchers have also contributed to the automated segmentation of heart sounds. For example, Naseri et al. [8] proposed a heart sound segmentation method based on the frequencyenergy method, which had excellent performance for various heart sounds. Boutana et al. [9] presented a time-frequency-analysis-based heart sound segmentation method and validated the efficiency of the method through some pathological heart sounds. Tang et al. [10] reported a dynamic-clustering-based segmentation method, which was effective for both normal and abnormal heart sounds. However, these methods all have some of the aforementioned limitations.
Based on the advantages and drawbacks of the methods mentioned above, the present study is focused on developing an automated heart sound segmentation algorithm that can position the boundaries and recognize the components for both normal heart sounds and heart sounds with S3, S4 and murmurs.
In this study, a method that combines time-domain analysis, frequency-domain analysis, and time-frequency-domain analysis was developed, and some novel strategies were proposed. The complete method comprises four parts, namely preprocessing, murmur elimination, boundary detection, and component identification. The overall process is presented in Fig. 1. First, the noisy raw heart sound is standardized in the preprocessing phase. Then, the signal goes through murmur elimination algorithm, where any possible murmurs are considerably eliminated without affecting the characteristics of the heart sound signal. In the following boundary detection phase, the onset and offset of each heart sound component are positioned and marked automatically. Finally, in the component identification phase, the cardiac cycle is calculated first, and the heart sound components are then recognized in each heartbeat on the basis of cardiac cycle.

Materials
To validate the proposed method, two quantitative experiments were performed, and the heart sounds from the University of Michigan's Heart Sound & Murmur Library were employed [11]. In the first experiment, 12 types of heart sounds with variable murmurs were used to verify the effect of murmur elimination. The total length of these sounds was 759 s, and the cardiac cycle number reached 888. The second experiment was performed to quantify the effectiveness of boundary detection and component identification. For this purpose, 16 types of sounds, including 2 normal and 14 abnormal types, were used, constituting a total of 1039 s (1251 cycles). The heart sound signals were recorded at 44.1 kHz, and each one lasted for at least 56 s. For the convenience of displaying and testing, each sound was divided into three-heart-cycle segments. Moreover, every two adjacent segments had an overlap of one cardiac cycle.

Down-sampling
Because the maximal frequency of the heart sounds did not exceed 1 kHz [12], the heart sound signals were down-sampled to 2 kHz according to the Nyquist-Shannon sampling theorem. Denoising A wavelet-based denoising method with soft threshold was utilized. The db4 wavelet was chosen as mother wavelet because its shape is similar to that of a heart sound signal, and the scale was established as 7 based on signal-noise-ratio (SNR) and normalized root mean squared error (NRMSE) [13]. The denoised heart sound was written as HS(n).

Normalization
To standardize the processing of heart sound, the heart sound signals were normalized as where N is the total number of sampling points.
Unless otherwise specified, the expressions "original heart sound" and "original signal" appearing in the following refer to HS N (n).

Murmur elimination
Different from the conventional constant-cutoff-frequency-based murmur elimination method (usually 200 Hz) [8,14], this paper presents a novel low-pass filter to remove the murmurs, namely the automatic-cutoff-frequency low pass filter (ALPF), whose cutoff frequency is calculated by analyzing the fast Fourier transform (FFT) of the heart sound.
Assuming that the modulus of the FFT sequence of HS N (n) is FFT H (n) , n = 0, 1, …, N − 1, the envelope of FFT H (n) is obtained by the moving average method as follows: where N is the sampling number of FFT H (n), and L F is the neighborhood radius of point n, L F ≪ N. Note that if L F is too large, some adjacent local peaks of E FFT (n) are merged together. Considering this and experimental results, L F is set to 5. The envelope is also normalized using (1) and is written as E F (n) . Unless otherwise specified, the "FFT coefficient" and "FFT envelope" appearing hereinafter refer to E F (n).
For the heart sounds without murmurs,E F (n) is generally comprised of primary peak and side peak (see in Fig. 2b). The primary peak is located in the low-frequency area with the highest amplitude, and the side peak is located in the relatively higherfrequency area with much smaller amplitude. Thus, most energy of heart sounds is concentrated within the frequencies of primary peak, and filtering out the side peak does not affect the shape of heart sounds (see in Fig. 2c, d). Therefore, the primary peak preserves the information of heart sound components (S1, S2, S3 and S4). Liu et al. BioMed Eng OnLine (2018) 17:106 For the heart sounds with murmurs, the primary peak still exists, but the side peak is usually merged with the frequency band of murmurs (see in Fig. 2f ). Considering that the murmurs usually have higher frequency than the primary peak, the frequency of primary peak's ending point is an appropriate estimation of the cutoff frequency to remove murmurs.
In this study, the ending point of primary peak is searched in the range of 20 to 200 Hz and is determined as the first valley point (the lowest point between two adjacent spikes) Fig. 2 The performance of ALFP. a Is a normal heart sound, c is the filtered normal sound by ALFP, e is a heart sound with S3 and systolic murmurs, g is the murmur-eliminated sound by ALPF. b, d, f and h Are the FFT of a, c, e and g, respectively. In b, d, f and h, the yellow curves are the FFT sequences, and the blue curves are the envelopes of them. In b, the points marked by purple diamonds are two examples of spikes. In b and f, the points marked by red squares and green squares are the primary peak points and the last valley points with FFT coefficient smaller than 0.2, respectively after the primary peak point with FFT coefficient smaller than 0.2. The value 0.2 is determined based on experimental experience. If there is no such an ending point between 20 and 200 Hz, the cutoff frequency is set to 200 Hz.
HS N (n) is filtered using this frequency, and the filtered sound is then normalized; the normalized sound is denoted as HS NLP .

Envelope extraction
After removing the interference of murmurs, the next step is to calculate the onsets and offsets of heart sound components. Because the envelope can reduce the complexity of computing while preserving the location information of the signals, the closing operation of mathematical morphology is utilized to obtain the envelope in the proposed method and is defined as where f(n) refers to the input signal, and g(n) is a structure element; ⨁ denotes the dilation operation, and ⨂ denotes the erosion operation. The lengths of f(n) and g(n) are P and Q respectively, and generally P > Q [15,16].
The choice of structure element g(n) directly affects the shape of the envelope obtained by this operation [17]. In this study, in order to make the shape of envelope simple, g(n) is designed as Moreover, Q, the length of g(n) , is also a significant parameter. According to physiological knowledge, in general, the time interval between adjacent heart sound components (S1S2 interval, S2S3 interval, S4S1 interval, etc.) is no less than 100 ms [8,18], which corresponds to 200 points under the sampling rate of 2 kHz. Thus, Q should be smaller than 200. In this study, Q was set as 30 on the basis of the experimental results.
The f (n) in (3) is replaced by HS NLP , and the heart sound envelope is obtained. The envelope is then normalized. The result is denoted as E NLP .
Although the ALPF can effectively reduce murmurs, the filtered heart sound may retain some low-amplitude residuals, which are also reflected in the envelope and affect the detection of boundaries. In this method, three operations are used to overcome this problem.

Thresholding processing
The first operation is thresholding.
For some murmurs, such as early systolic murmurs, their onsets partially overlap the offsets of S1s. After filtering by ALPF, the residues of murmurs may still be linked with S1s. Consequently, the murmur residues and S1s may be mixed together in E NLP . Therefore, the purpose of thresholding is to separate the boundaries of murmurs from those of S1s to conduct the subsequent operations. Consequently, a small threshold can complete this task. Due to the use of small threshold, the boundary information of the heart sound components can be preserved as much as possible.
The envelope Ẽ NLP after thresholding is calculated as and θ C = 0.025. The parameter θ A enables the algorithm to determine an appropriate threshold based on the characteristics of the signal itself, and θ C is a reference threshold, preventing error in θ A ; the value of θ C is determined by experimental results.

Opening operation
After thresholding, the boundaries of heart sound components and interfering residues are separated. The remaining residues are considerably smaller than the heart sound components in both duration and amplitude. Because the opening operation of mathematical morphology can reduce spikes, it is utilized as the second stage of residue removal. The opening operation is defined as [19] The g(n) in opening operation is still defined using (4), but the length of g(n) is denoted as Q' . Besides, the parameters in (6) are same as those in (3).
As in the closing operation, Q' is also a significant parameter. Because the average durations of S1 and S2 are approximately 100 ms and the durations of S3 and S4 are approximately 50 ms [8], which correspond to 200 and 100 sampling points, respectively. Thus, Q' must be smaller than 100; in this study, it was set to 50 based on experimental results. The results of the opening operation are normalized, and the normalized signal is denoted as Ẽ NO .

Energy thresholding processing
Finally, to completely exclude the interference components from the envelope, the energy thresholding approach is used. Considering that Ẽ NO is comprised of points of zero and non-zero, i.e., the regions located by S1, S2, S3, S4 and few murmur residues are non-zero, and the rest regions are zero. Therefore, the boundaries of each component (including the interference components escaping the opening operation) can be automatically obtained as follows: then, the ith point is determined as a potential onset; the jth point is determined as a potential offset. The potential onset and the potential offset always appear in pairs and these onsets and offsets are denoted as SPT(i) and EPT(j) respectively, where i, j = 0, 1, . . . , M − 1 . M is the number of potential onsets/offset.
Consequently, the energy of each component can be obtained as Liu et al. BioMed Eng OnLine (2018) 17:106 Assuming for each E k , then the kth potential onset and offset are considered invalid and removed from SPT and EPT. Moreover, the points between these invalid onsets and offsets are determined as interference parts and are set to zero using (8) In this inequality, η is set to 0.25 based on experimental experience. The remaining potential points are determined as the final onsets and offsets of the heart sound components.

Cardiac cycle calculation
The last step is to recognize the heart sound components. Considering the quasi-periodic nature of heart sounds, this step can be more efficiently accomplished if the cardiac cycle is calculated. In some studies, the cardiac cycle was calculated by using the partial autocorrelation function (PACF) [10,15]. However, because of the inherent defects of PACF, the calculating results are not satisfactory. In order to overcome the shortcomings of PACF, this study proposes a cardiac cycle calculation method based on the unbiased autocorrelation function (UACF), considerably improving the applicability.
where m = 0, 1, . . . N − 1, N is the sampling number of Ẽ NO . Figure 3 demonstrates the PACF and UACF. For PACF, as m increases, the number of sampling points involved in multiplication and accumulation decreases gradually. Consequently, the primary peaks of the PACF exhibit gradual attenuation (see in Fig. 3b), causing the side peaks between the primary peaks to be interference of the cardiac cycle calculation. By contrast, the UACF only averages the terms that are involved in multiplication and accumulation, thereby overcoming the drawback of primary peak's attenuation in the PACF. Figure 3c shows that there is a relatively significant difference in amplitude between the primary peaks and side peaks. To further expand the amplitude difference, square energy, which is defined in (11), is utilized. After obtaining the energy signal, Liu et al. BioMed Eng OnLine (2018) 17:106 thresholding is used to remove the side peaks. The energy signal of the UACF and the energy signal after thresholding are denoted as E R ′ and Ẽ R ′ , respectively.
The last peak of Ẽ R ′ is likely to be a side peak that escaped from thresholding because of its relatively higher amplitude (see in Fig. 4c). This kind of side peak is usually generated by the multiplication and accumulation of S1 and S2. In order to calculate the cardiac cycle accurately, the last peak of Ẽ R ′ is forcibly removed regardless of whether it is a side peak or a primary peak (see in Fig. 4d). Finally, the UACF of Ẽ R ′ with the last peak removed is calculated, and this UACF is denoted as R Final (see in Fig. 4e). Then, thresholding is performed on R Final with a threshold of 0.5 × max (R Final ), leaving only primary peaks in the thresholded R Final . The intervals between adjacent peaks  Figure 4 shows the main procedures of cardiac cycle calculation.

Component recognition
After determining the cardiac cycle, the heart sound components can be recognized beat by beat. Under normal circumstances, in terms of the number of components in one heartbeat, heart sounds can be divided into three categories: two components (S1 and S2), three components (S1, S2, S3, or S4) and four components (S1, S2, S3, and S4). Moreover, in the time domain, S1S2 interval < S2S1 interval, S2S3 interval < S3S1 interval, and S2S4 interval > S4S1 interval, whereas in the frequency domain, the frequencies of S1 and S2 are usually higher than those of S3 and S4 [8,15]. These priori knowledge along with the mentioned three categories can be utilized to recognize the heart sound components. Figure 5 presents the overall process of component recognition. The algorithm finds a heartbeat based on cardiac cycle and automatically obtains its component number. Then, the recognition processing is performed according to the component number found. For the convenience of expression, in the following part, the jth component of the ith heartbeat is denoted as C j i , and the time interval between C m p and C n q is denoted as T mn p,q .
Assuming that the i th heartbeat is being analyzed, then the recognition processing can be expressed as follows.
Two components: In this case, T i,i 12 and T i,i+1 21 are calculated. If T i,i 12 < T i,i+1 21 , C i 1 is identified as S1, and C j 2 is identified as S2; otherwise, C i 1 is S2, and C j 2 is S1. Three components: In this case, because S3 and S4 are less than S1 and S2 in frequency, these three components can be recognized using three steps: detecting S3/S4, recognizing S1 and S2, and identifying S3/S4.
Time-frequency analysis is required for the detection of S3 and/or S4. In this study, S-transform is employed to accomplish this task. Assuming that the signal to be analyzed is h(n), n = 0, 1, …, N − 1, its Fourier transform is H(k/NT ), k = 0, 1, . . . , N − 1, where T is the sampling interval. Then, the S-transform of h(n) is given by [20].
To obtain sufficient time-frequency information, the S-transform is directly performed on the original heart sound signal HS N (n). The result is a complex matrix and is written as S. Suppose that the element of the mth row in the nth column of the matrix is S m,n = a + jb, then the modulus of S m,n is defined as.
The modulus of each element of S is calculated, and the modulus matrix |S| is then obtained. Unless otherwise specified, both the expression "S transform" and "S coefficient" in the following text refer to |S|.
The two-dimensional |S| matrix is relatively difficult to analyze; thus, the onedimensional instantaneous frequency is utilized to detect S3 and S4, and is defined as.
where F (m) = m NT = m·f s N represents the frequency of the mth row of the S-transform, and f s is the sampling rate. Because the maximal frequency of the sampled signal is half of the sampling frequency, F (m) was set as 1 2 m NT = m·f s 2N , m = 0, 1, . . . , N − 1 in this study. Furthermore, the boundaries of heart sound components are obtained in the boundary detection phase (i.e., SPT and EPT). Therefore, by extracting the parts between these onsets and offsets from f H (n), the instantaneous frequency of heart sound components is obtained (see in Fig. 6(3)).
Finally, the average instantaneous frequency is calculated. Figure 6a-d present an example of these operations. It is observed that the average instantaneous frequency of S4 is significantly lower than those of S1 and S2. Similarly, the instantaneous frequency of S3 is also the lowest among S1, S2 and S3. Consequently, the component with the lowest frequency of each heartbeat is S3 or S4, which means that the previous component is S2 and the next is S1.
Without loss of generality, assuming that C i 2 is the component with the lowest frequency, C i 1 and C i 3 can be easily determined as S2 and S1, respectively. Then, T i,i 12 and T i,i 23 are calculated and compared. If T i,i 12 < T i,i 23 , C i 2 is recognized as S4; otherwise, if T i,i 12 > T i,i 23 , C i 2 is S3. Four components: In this case, the four components are arranged in the order: S1 → S2 → S3 → S4 → ··· S1 → S2. Thus, the first one of the two components with the lowest instantaneous frequencies is S3, and the second is S4. Moreover, the component preceding the detected S3 and S4 is S2, and the following component is S1.
After all the components in the ith heartbeat are recognized, the algorithm finds the next beat, and performs the same recognition processing. When all the heartbeats are analyzed, the recognition process is finished.

Murmur elimination evaluation
For the evaluation of murmur elimination, a novel index, the signal murmur ratio (SMR), is proposed as follows: where X(n) is the heart sound signal, U is the region where heart sound components (S1, S2, S3, and S4) are located, and V is the region where murmurs are located. To ensure the validity of the indicator, both U and V are manually determined from the original signal HS N (n). Thereby, for a certain heart sound, U and V are constant when calculating SMRs of the murmur-reduced heart sound and the non-murmur-reduced heart sound (see in Fig. 7). A higher SMR indicates a smaller murmur energy proportion in signal X(n) , which means a stronger murmur removing effect.

Boundary detection and component identification evaluation
To validate the performance of boundary detection and component identification, three classical metrics, sensitivity (Se), positive predictive value (PPV), and accuracy (Acc) are calculated. These three indices are defined as follows: Time-frequency analysis: a is a heart sound with murmurs and S4; b is the instantaneous frequency of this heart sound; c is the instantaneous frequency only for the heart sound components; d is the average instantaneous frequency of c. "NA" represents "normalized amplitude" In this experiment, the onsets and offsets of heart sound components were automatically marked with green circles and red stars by the algorithm (see in Fig. 8c), respectively. In addition, the names of these components were automatically labeled above them (see in Fig. 8d). The results of boundary localization and component identification were judged by human and quantified using (17)- (19), in which the FN means false negative, FP means false positive, TN means true negative and TP means true positive. Murmur elimination effect for heart sound containing S4 and systolic murmurs using the three methods. a Is the non-murmur-reduced heart sound, and b-d show the sounds filtered by ALPF, WT and 200-LPF, respectively. In the calculation of SMR, U = U1 + U2 + · · · + U6 , and V = V 1 + V 2 + V 3 Fig. 8 The results of murmur elimination, boundary detection and component identification for heart sound containing S4 and systolic murmurs

Results of murmur elimination
The performance of ALPF was compared with two existing methods. The first method is the conventional low-pass filter with a constant cutoff frequency of 200 Hz (200-LPF), and the second method is the wavelet package transform method with a db10 mother wavelet and a scale of 5 (WPT) [14]. In the WPT method, the heart sound components were reconstructed by nodes (5, 0) to (5,5) to remove interference components such as murmurs. The SMR of the heart sounds filtered by the ALPF, WPT and 200-LPF are denoted as SMR A , SMR WPT , and SMR 200 , respectively. Moreover, the SMR of the original heart sounds HS N (n) was calculated and denoted as SMR Ori . In addition, the SMR gains of these three methods were achieved by subtracting SMR Ori from SMR A , SMR WPT and SMR 200 , respectively. All the experimental parameters of these three methods were identical. Tables 1 and 2 show the statistical results of SMR and SMR gains. It is observed that the proposed ALPF has the highest score of SMR and SMR gains. The overall  Figure 7 shows an example of the murmur elimination results using the three methods.

Results of boundary detection and component identification
A detailed quantitative description of the boundary positioning result is shown in Table 3. It is observed that the proposed method achieves a sensitivity of 100% for all 16 types of heart sounds, and both the PPV and accuracy are 99.93%, indicating an outstanding performance. Table 4 presents the quantitative data of component identification. Because of the good positioning effect, the identification performance is high, with an average Se of 98.63%, an average PPV of 99.86%, and an average accuracy of 98.49%. Figure 8 shows the complete process of the proposed method, including murmur elimination, boundary detection, and component identification. The first component (S4) in Fig. 8d is not labeled. This is because the identification of S4 requires   pathological cases of heart sounds, respectively. As mentioned in the "Background", a complete segmentation is comprised of boundary detection and component identification. However, several existing works only focus on one of them. Moreover, some studies could be applied only to the basic situation, for example, heart sounds without murmurs. Therefore, in addition to achieving excellent performance, the proposed approach achieves the functions of both boundary localization and component identification and can be applied in more conditions.

Conclusions
This paper presents an accurate heart sound segmentation algorithm that combines time-domain, frequency-domain and time-frequency-domain analysis. Compared to existing studies, this method is applicable to a wide range of heart sounds, from normal to those containing S3, S4 and various murmurs. To verify this method, quantitative experiments were performed using the University of Michigan's Heart Sound & Murmur Library, an authoritative open database. The experimental materials incorporated two types of normal heart sounds and 14 types of abnormal heart sounds. The results show that the boundary localization has an average Se of 100%, an average PPV of 99.3% and an average Acc of 99.93%. Moreover, the Se, PPV and Acc of the component identification reach 98.63%, 99.86% and 98.49%, respectively, indicating outstanding performance of the proposed method. There are still some shortcomings of this work. For example, the component identification relies on the success of cardiac cycle calculation; therefore, this method cannot be applied to the heart sound with severe arrhythmia because of the failure to achieving accurate cardiac cycle by using UACF. The study of segmentation provides a good basis for extracting significant features of heart sounds. Therefore, the further study will focus on the classification of heart sounds.