A Fast Instantaneous Frequency Estimation for Underwater Acoustic Target Feature Extraction

Traditional auditory features merely present the amplitude characteristics of target signals in frequency domain. Such features are susceptible to environmental noise, resulting in significant degradation of recognition stability. Inspired by instantaneous information applied in speech signal processing field, this paper proposed a feature extraction method using sub-based instantaneous frequency. A fast instantaneous frequency information extraction algorithm is proposed with the normalized Gammatone filterbanks. Experiments confirm that the proposed feature extraction method can effectively maintain the recognition accuracy under low SNR conditions while reduce the computation cost.


Introduction
Currently, underwater acoustic target recognition technology based on passive sonar is still one of the most important tools in the field of underwater detection [1] . The radiated noise emitted by navigators, such as ships, contains important information [2]. Thus the noise can be used to distinguish the navigators by differentiated analysis. In recent years, deep learning based underwater acoustic target recognition methods have developed rapidly. Such algorithms often require a large number of samples [3] [4]. However, it is extremely difficult to obtain valid samples of key targets in practical situations. Therefore, traditional feature extraction algorithms based on a small number of samples still play a crucial role in underwater acoustic target recognition.
Cepstral feature extraction methods inspired by the human auditory system, such as Mel Frequency Cepstral Coefficient (MFCC) and Gammatone Frequency Cepstral Coefficient (GFCC), have been widely used in marine mammal detection and underwater target detection in recent years [5]. However, such bionic cepstral feature extraction algorithms are prone to fail in complex environments due to the complexity of ocean noise. To adapt to the complex underwater environment, instantaneous frequency (IF) [6] was introduced as a complement to enhance the original feature. Although it can effectively improve the recognition accuracy to some extent, the complexity of recognition method will be increased inevitably, further imposing high requirements on hardware platform.
To address the problem of target recognition in complex environment with small sample size, this paper proposes a more efficient IF estimation method based on auditory cepstral features. Then, an efficient and integrated feature extraction method for underwater acoustic targets is constructed. In terms  [7]. Thus, SVM is used as a classifier for a reasonable evaluation of the feature extraction algorithm in experiments.

Traditional GFCC
GFCC refers to the cepstral coefficients based on the output of Gammatone filterbank subbands [8] [9]. Essentially, it is to use the Gammatone filters connected in series to construct a simulated cochlear hearing function. It can be reflected in the robustness to the variation of the frequency resolution and environmental noise. The flow of traditional GFCC feature extraction is shown in Fig. 1.

Preprocessing
Calculate the energy spectrum The specific steps of GFCC feature extraction are as follows.
(1) Pre-processing. A Hamming window is added to original signal to reduce the truncation effect. The overlap is set to zero because underwater acoustic signal of ship targets is usually smooth.
(2) Gammatone filtering. Since the energy of ship targets is distributed in low and medium frequency bands, a 24-channel Gammatone filterbank is constructed to filter the data frames in time domain. Although the Infinite-Impulse Response (IIR) construction method has dramatically improved the computational efficiency, the all-pole IIR filter shows a large spectral shift in a specific frequency band compared to the ideal finite-impulse response (FIR) method. Therefore, when the computation requirements at high frequencies are not so high in terms of precision, the filtering can be performed in frequency domain. In this way, the 24-channel Gammatone filtered vector can be represented as ( ) s m , where m is the index of channel.
(3) Nonlinear compression. In order to better simulate the nonlinear correction for sound intensity of human ear, the output of each filter is nonlinearly corrected using cubic root operation.
(4) Discrete cosine transform. The filter output is orthogonalized using discrete cosine transform (DCT). The coefficients from the 1st-to 12th-order are retained and the 0th order coefficient is removed. The obtained result is used as GFCC. The following equation shows the DCT expression for GFCC, where u is the component index, the number of sub-filters is M=24.
(5) Dynamic feature extraction. To characterize the changing characteristics of underwater targets, first-order differential GFCC and second-order differential GFCC features are extracted to enhance feature recognition capability. The two features are mathematically defined as follows.
The Gammatone filterbank is a linear bionic model inspired by the auditory function of human cochlea. One of the most significant characteristics of auditory perception is that the subband response bandwidth of human ear increases with increasing frequency. Macroscopically, it is manifested as a decrease in the resolution of high-frequency signals.

Normalized GFCC
In order to simulate the decay of cochlear hearing function with increasing frequency, Gammatone filter is normalized to high and low frequencies. This aims at increasing the weighting of low frequencies while decreasing the weighting of high frequencies, so as to make the process more physiologically realistic. The normalized Gammatone filterbank is shown in Fig. 2. To sum up, GFCC essentially characterize the energy distribution of target in frequency domain.

Direct IF Extraction
In speech recognition, the phase information in frequency spectrum can be used to characterize and recognize speakers [10] [11]. A 24-channel Gammatone filterbank can be used to separate the broadband frequency spectrum in 50 Hz-5000 Hz to several subband signals. It can be seen that the subband signals contain a relatively fixed range of frequency components. Thus, each subband signal can be considered as a narrowband signal. After that, IF estimation can be performed on subband signals.
For a subband real signal     , , 1,2, ,24 a t l l  L , the IF can be calculated as follows. (1) Hilbert transform is used to construct its analytic signal   Then the complete signal     Fig. 3. Since the signal has been separated, the sub-signal contains only positive frequency components. So that the phase of sub-signal changes counter clockwise in a complex plane. However, the phase   , t l  is defined in a range of (-π, π) . Then, a jump of 2π  shall be generated in the boundary from π to -π , which is the main cause of the discontinuous and abrupt frequency.

Optimized IF Extraction
By observing the first-order differential results of phase obtained by direct derivation, it can be found that there are regular intervals among negative abrupt change points, and the intervals of abrupt change points in different channels are distributed in different ways. Thus, this distribution characteristic can be used to optimize the IF calculation. Motivated by this, an optimized IF calculation method is shown as follows. In this process, the phase is shown as discrete signal as   (1) Perform differential operations on the directly derived phase to retain anomalies.
denotes the second-order differential phase vector, corresponding to the abrupt change in different channels.
where s F is the sampling frequency. For different subband signals, the number of frequency points K in   , optimal f k l obtained by above estimation method is much smaller than the number of elements N in   , f n l obtained by amplitude weight smoothing.
The dimensionality of   , optimal f k l is much reduced compared with the original signal. The reason is that   , optimal f k l is defined as a reciprocal of the time elapsed for phase change in each cycle of   -π, π , and there is only one value in each cycle. The optimized IF not only is of great physical significance but also makes the obtained frequency estimation more accurate. Besides, the complexity of computation is also greatly reduced.

Experiments
In the experiments, the measured underwater acoustic signals of ships recorded by passive sonar are used. The dataset contains six types of navigator targets, which are labelled as Class A, Class B, Class C, Class D, Class E, and Class F. The data sampling frequency is 22050 Hz. The measured signal-tonoise ratio (SNR) is much higher than 20 dB, which can be regarded as noiseless. In the sample preprocessing, 2048 points per frame length (about 0.1 s) are taken as a sample, and the number of samples for the six target categories are 211, 212, 214, 165, 208, and 206, respectively.
For classification, SVM is used as the classifier and three types of features, namely MFCC, N-GFCC (Normalized GFCC), and N-GFCC+direct IF, are used as references to test the characterization ability of the proposed GFCC and IF feature, which is denoted as N-GFCC + optimized IF.
In addition, to test the robustness of the feature extraction algorithm to noise, underwater acoustic signals with different SNRs were simulated by artificially adding Gaussian white noise of different intensities. Each group of data was tested five times, and the average of five experimental results was used as the final results. The detailed results with recognition accuracy and computation cost are shown in Table 1. From the recognition results, it can be seen that N-GFCC can achieve a high recognition accuracy (more than 90%) when the SNR is greater than 5 dB, and still maintain a high recognition accuracy when the SNR is relatively low. In addition, compared with MFCC and N-GFCC, N-GFCC+direct IF and N-GFCC+optimized IF are both improved in terms of recognition accuracy under low SNR conditions. It indicates that the integrated feature methods have obvious advantages in noise immunity. The overall feature recognition performance of N-GFCC+direct IF and N-GFCC+optimized IF is significantly improved. The main reason lies in that the subband IF contains key information of target. Meanwhile, compared with N-GFCC+direct IF, the N-GFCC+optimized IF proposed in this paper can reduce the computation time to one-third without affecting the overall recognition accuracy.

Conclusion
In this paper, we proposed an improved feature extraction method for underwater acoustic target recognition. An efficient IF information extraction algorithm was applied on the subbands of normalized Gammatone filterbanks, so as to construct an auditory feature extraction method incorporating subband information. The method maintains the recognition performance of ship-like underwater acoustic targets under low SNR conditions and significantly reduces the computational cost. Thus, we may focus our work on the IF estimation accuracy to improve the recognition accuracy in the future.