A SIFT-Like Feature Detector and Descriptor for Multibeam Sonar Imaging

Acoustic Science and Technology Laboratory, Harbin Engineering University, Harbin 150001, China Key Laboratory of Marine Information Acquisition and Security (Harbin Engineering University), Ministry of Industry and Information Technology, Harbin 150001, China College of Underwater Acoustic Engineering, Harbin Engineering University, Harbin 150001, China Zhejiang University, Hangzhou 310058, China


Introduction
In recent years, with the constant exploration of oceans, multibeam sonar (MBS) is being applied in many scenarios, such as underwater environment mapping [1], underwater target detection [2], and underwater terrain-aided navigation [3]. However, noise interference is strong in water, which makes it more difficult to extract useful information from the images. In order to extract and track features better, it is necessary to match the target features within the sonar image sequences. In the sonar image processing domain, highresolution sonar imagery is mainly affected by a granular multiplicative noise known as speckle noise, whose presence in MBS images is particularly strong [4][5][6]. In the feature extraction process, speckle noise pixels and the corner points of targets of interest have similar gradients. This greatly increases the probability of a false match. So, the focus of this work is to develop a new feature detector and descriptor that overcomes the effects of strong speckle noise in MBS imagery. than SURF. For the purposes of this application, SIFT and its variants are more suitable for feature matching in sonar images.
The SIFT algorithm, presented by Lowe [13], is an interesting option to detect, describe, and match features. Feature detectors and descriptors provide good performance due to their invariance to various transformations. Due to its efficiency in detection, description, and image matching, the SIFT algorithm and its variants are widely used in the field of computer vision and remote optical sensing. Kishu and Rana [14] applied principal component analysis (PCA) to reduce the SIFT descriptor's dimensionality and achieve better results in the field of facial recognition. For image scene classification, Ju et al. [8] worked directly with the two-dimensional matrix and applied generalized PCA (GPCA) to obtain a lower-dimensional matrix. The SIFT algorithm has also been applied in the field of object detection. Tao et al. [15] exploited the clustering information from matched SIFT keypoints and the region information extracted through image segmentation. Urban areas and buildings were detected in satellite images by Sirmacek and Unsalan [16] using SIFT and graph theory tools. Huang et al. [17] presented a novel interest point detection algorithm using a Laplacian-of-Bilateral (LoB) filter to detect extrema in scale space. In order to improve interest point detection, Li et al. [18] proposed a new learning-to-rank framework to eliminate unstable extrema derived from the keypoint detection stage of SIFT.
While the SIFT algorithm has proven its efficiency for various applications in remote optical sensing, the situation is different for sonar images. Sonar image sequences from the same scene appear different because several varying factors affect the imaging technique. These factors include electrical noise, speckle noise, decorrelation due to medium instabilities, and target motion. In particular, the strong speckle noise interference makes image processing very difficult because the gradient by difference operation in the keypoint detection step of the SIFT algorithm is susceptible to this type of noise. Researchers have proposed various methods to improve SIFT's performance in such cases. Depending on the application, some researchers removed or modified some steps of the algorithm to improve its performance [19,20]. Others suggested denoising or filtering the images during preprocessing to reduce the influence of speckle noise [21,22]. To suppress false matches, spatial relationships between keypoints have been considered [23]. However, the performance of these newly developed algorithms is still relatively limited, as the number of keypoints detected is not stable and there are insufficient correct matches. In the stage of feature extraction, most of these algorithms are based on Hessian matrices, which rely on the second derivatives. Meanwhile, most of these algorithms do not consider the statistical specificities of speckle noise. These reasons make them easy to adapt to speckle noise. In order to obtain consistent and well distinguishable feature points, it is of primary importance to adapt SIFT-like algorithms to this context.
Considering the gradient definition of existing algorithms [9][10][11][19][20][21][22], in this study, a new gradient definition adapted to MBS images is presented. This new gradient definition makes magnitude and orientation robust to speckle noise. It is then used in several steps in the new algorithm for the extraction of local descriptors. An application of this novel algorithm is then presented for target feature matching (including artificial and nonartificial targets) in MBS images.
In this paper, we propose a new algorithm for the extraction of local descriptors, adapted to the statistical specificities of MBS images with noise following a Weibull distribution. This algorithm is largely inspired by SIFT and will be referred to as MBS-SIFT. In Section 2, after analyzing the speckle noise and the background noise distribution of sonar images, a new gradient definition is presented, which yields a magnitude and an orientation robust to speckle noise. We introduce the MBS-SFIT algorithm, including both the detection of keypoints and the computation of local descriptors, in Section 3. Section 4 shows the experimental validation and performance possibilities offered by this new algorithm in MBS images. Finally, we conclude the paper in Section 5.

A New Gradient Definition for Multibeam Sonar Images
Actually, mostly due to speckle noise, traditional approaches in feature-based detection do not perform well on MBS images. In this section, the speckle noise of MBS images and the statistical distribution of the background noise are first described in detail. We then introduce a new gradient definition to replace the classical gradient by difference for the detection of keypoints and the computation of local descriptors.

Background Noise Distribution Estimation of Multibeam
Sonar Images. Speckle noise is one of the primary sources of noise in MBS images. It is mainly caused by interference affecting the returning acoustic wave inside the transducer due to the roughness of the material surface on the wavelength scale. The scattered signal adds coherently, producing patterns of constructive and destructive interference that appear as brighter or darker dots in sonar images. Speckle noise affects the image quality and obscures important information, such as edge and shape and intensity value. Furthermore, background noise distribution estimation is very important for target detection. Compared with the lognormal distribution and K-distribution, the Weibull distribution provides a trade-off between the accuracy of modeling and computational cost [2]. For these reasons, the Weibull distribution is more suitable to describe the statistical properties of this type of background noise. The Weibull distribution function is defined as where λ is the scale parameter and k is the shape parameter. 2

Journal of Sensors
The background noise can be represented as a set of samples X = ðx 1 , x 2 , ⋯, x i , ⋯, x n Þ with a likelihood function defined as After derivation, the maximum-likelihood estimators of k and λ are Figure 1 presents the estimated probability density function of the background noise. Figure 1(a) introduces the empirical histogram of the background noise pixels of the MBS imagery shown in Figure 1(b). Figure 1(b) was obtained using Blueprint's Oculus M750d MBS, whose parameters are described in more detail in Section 4.1. The parameters of the Weibull distribution were estimated from the image in Figure 1(b), whose size is 1350 × 2620 and contains only clutter. As we can see, the data in Figure 1(b) fit the Weibull distribution model well.

Gradient Computation for Multibeam Sonar Images.
A large number of edge detectors specifically developed for sonar images are based on the difference of average pixel values or a ratio of average (ROA). However, in speckled images, the ratio edge detector has been found to be much more effective than the classical gradient by difference [24]. Different from the arithmetic mean of the ROA operator, the ratio of exponentially weighted averages (ROEWA) operator is more accurate and stable in edge detection [25]. It computes exponentially weighted local means of the image, as shown in Figure 2.
For example, given a point (x 0 ,y 0 ), in the twodimensional ROEWA operator, the means for the vertical gradient direction (θ = π/2) are defined as where f ðx, yÞ = exp ð−ðjxj + jyjÞ/αÞ is the exponential weighting function and α is the exponential weight parameter. The ROEWA and its normalization for a direction θ are defined as Therefore, the ratio for the horizontal direction T 0,α and the vertical direction T π/2,α can be calculated using Equation (5b). Moreover, the gradient magnitude for each pixel is defined as Compared with the ROA operator, ROEWA is more robust to noise and more accurate in a multiscale edge context. ROEWA provides a good estimate of the gradient magnitude, but it is not a precise method to estimate the gradient orientation. This can be solved by increasing the number of directions, which adds to the computational cost. Inspired by the gradient-based edge detector for optical images [26], the gradient magnitude and orientation are estimated as However, the estimation of orientation shown in (7) is ambiguous. T 0,α and T π/2,α are always positive, and thus, Ori α can only take values between 0 and π/2. This is contradictory to the expectation that Ori α should take values between 0 and 2π. Moreover, the gradient computation on a vertical edge with reflectivities m a and m b (m a <m b ) yields T π/2,α = 1, Therefore, the gradient orientation takes arbitrary values depending on the reflectivities of the areas, when in fact it is equal to zero. In this paper, we use the gradient by ratio (GR), presented by Dellinger et al. [27], to avoid the above shortcomings. We define the horizontal and vertical gradients as Based on this definition, the gradient magnitude and orientation are as where α is the exponential weight parameter. The scale space is established by using different values of α during image filtering.

Journal of Sensors
In Section 2.1, we saw that the speckle noise of MBS images is randomly distributed. Here, we denote the pixel value and speckled noise of a point (x, y) as Iðx, yÞ and Nðx, yÞ, respectively. We then proceed to calculate the gradient magnitude as where I 1 ðx, yÞ, N 1 ðx, yÞ and I 2 ðx, yÞ, N 2 ðx, yÞ are the exponentially weighted values of the horizontal and vertical directions of the image, respectively. In the case of speckle noise, the direction distinction is unnecessary, as the noise intensities are assumed to be approximately equal in the horizontal and vertical directions. Thus, Equation (11) can be approximated as

Journal of Sensors
From Equation (12), we see that the gradient information of the image is independent of the region's value on both sides of the pixel and only related to the weighted ratio of the intensities in the two directions.
By using the logarithm, the aforementioned problem of determining the gradient orientation on a vertical edge is avoided, since the computation yields In order to obtain both negative and positive gradient values, no normalization is performed between the ratio and its inverse. With this approach, all possibilities of orientation values are taken into account. Figure 3 presents the gradient magnitude computed by difference and ratio on a rectangle corrupt using speckle noise. Compared with the region without noise, the values produced using differences (Figure 3(b)) are higher and inhomogeneous within the rectangle. Unlike the gradient by difference, the GR method does not produce higher values on the rectangle corrupt by noise, while the values are homogeneous (Figure 3(c)). This shows that gradient by difference is susceptible to noise, while GR is only affected by the neighborhood weighted ratio of pixels and the gradient magnitude is hardly affected by noise.

SIFT-Like Algorithm Adapted to Multibeam Sonar Images
In this section, we first introduce the original SIFT algorithm. It includes three steps, keypoint detection, orientation assignment, and descriptor representation. In order to adapt the SIFT algorithm to MBS images, we present the MBS-SIFT algorithm, the corresponding steps of which are keypoint detection using the MBS-Harris detector, orientation assignment, and descriptor extraction via the GR method.
3.1. Original SIFT Algorithm. The SIFT algorithm has been introduced to characterize local gradient information. The SIFT descriptor is a sparse feature representation that combines two operators: a feature detector and a feature descriptor. Detection involves selecting points of interest, and descriptors are then calculated to describe these features. The SIFT algorithm mainly consists of keypoint detection, orientation assignment, and descriptor representation.
3.1.1. Keypoint Detection. The first stage of keypoint detection is to select and identify position ðx, yÞ, scale σ, and orientation θ that can be repeatedly assigned under different conditions. The difference-of-Gaussian (DoG) scale space, as a close approximation of the scale-normalized Laplacian of Gaussian (LoG), is constructed with scales σ l = σ 0 ⋅ r l and l ∈ f0, 1, ⋯, l max − 1g. Each sample point is compared to its eight neighbors in the current image and nine neighbors in the scale above and below. Local extrema are then selected to obtain keypoints defined through their location and scale. Candidates with low contrast or located on edges are filtered using a threshold on the multiscale Harris corner detector, which is based on the Harris matrix. The Harris matrix and corner detector are, respectively, defined as  ⋅ σ, I σ is the convolution of the original image with a Gaussian kernel with standard deviation σ, t is an arbitrary parameter, and * is the convolution operator in x and y.
In this study, the keypoints are detected as local extrema in the LoG scale space. To eliminate points lying on edges or having low contrast, the multiscale Harris criterion is applied [28]. We refer to this approach as the LoG-Harris detector.
3.1.2. Orientation Assignment. By assigning a consistent orientation to each keypoint based on local image properties, the keypoint descriptor can be expressed relative to this orientation. In this manner, we achieve invariance to image rotation. After substantial experimental comparisons, Lowe presented a method for the calculation of a local histogram of gradient orientations, weighted by the gradient magnitudes and a Gaussian-weighted circular window [29]. The orientation histogram is formed on a scale-dependent region around the keypoint. The highest peak in the histogram is determined, and then, any other local peak above 80% of the maximum is also selected as an orientation. Therefore, there will be multiple keypoints detected at the same position and scale but with different orientations. In [13], the orientation histogram was divided into 8 bins covering the 360°r ange of orientations. Figure 4 shows a keypoint descriptor calculated to describe the local geometry in each keypoint Pðx, y, σ, θÞ. First, a square neighborhood is defined around the keypoint with the scale of the keypoint to achieve translation and scale invariance. The gradient orientations are then rotated relative to the keypoint orientation to achieve orientation invariance, illustrated using small arrows at each sample location in Figure 4(a). The gradient magnitudes are weighted using a Gaussian function to avoid sudden changes and to give less emphasis to gradients that are far from the center of the descriptor. This is indicated using a circular window in Figure 4(a).

Descriptor Representation.
This normalized neighborhood is then divided into 4 × 4 sample regions, as shown in Figure 4(b). Each region consists of eight directions for each orientation histogram,  Journal of Sensors with the length of each arrow corresponding to the magnitude of that histogram entry. For each keypoint, the SIFT descriptor is obtained by concatenating and normalizing these histograms. The original SIFT algorithm was designed to detect structures that are specific to optical photography with low noise conditions. In the LoG-Harris detector, the multiscale Harris criterion allows the suppression of low-contrast and edge points by applying a threshold r H on Rðx, y, σÞ. However, the speckle noise in MBS images leads to a stronger gradient magnitude on homogeneous areas with high reflectivity than on areas with low reflectivity. The SIFT algorithm does not perform well on MBS images, since the calculation of gradient and orientation relies on a classical gradient by difference.

Proposed MBS-SIFT Algorithm.
To adapt the SIFT algorithm to MBS images, it is necessary to consider their statistical properties. The new gradient calculation method for MBS images presented in Section 2.2 is applied in several steps of the proposed algorithm. Both the magnitude and the orientation obtained are robust to speckle noise.
This new algorithm, named MBS-SIFT, operates in a similar way as the original SIFT algorithm, i.e., with a feature detector followed by a feature descriptor. In this section, a new keypoint detector is introduced, as well as a new orientation assignment and a MBS-adapted feature descriptor. The outline of this algorithm is presented in Figure 5. The main contributions of this paper are the keypoint detection based on the MBS-Harris detector and the histogram of gradient orientation obtained using the GR method.

Keypoint Detection Based on MBS-Harris Detector.
LoG matrices rely on the second derivative, so they are not convenient to adapt to speckle noise. In contrast, the multiscale Harris function is based on the first derivative. Based on the new gradient computation adapted to MBS images mentioned in Section 2, we present a new algorithm to detect the keypoints.
Combining the multiscale Harris matrix, the function defined in (14), and the GR calculation, we propose the multiscale MBS-Harris matrix and the multiscale MBS-Harris function, respectively, as where the derivatives G x,α and G y,α are calculated in Equation (9) and d is an arbitrary parameter.
In the MBS-Harris detector, instead of the LoG scale space, we use a multiscale representation of the original image, which is achieved by the multiscale MBS-Harris function at different scales β m = β 0 ⋅ c m , where m ∈ f0, 1, ⋯, m max − 1g. Local extrema are then detected and compared with the 8 adjacent points of the same scale and the 18 points corresponding to the scales adjacent above and below. Here, a bilinear interpolation is used to refine the subpixel location of the keypoints, while we suppress low-contrast and edge   7 Journal of Sensors points by applying a threshold r MH on the multiscale MBS-Harris function R MH ðx, y, σÞ. Finally, the keypoints are characterized using their location and scale. The MBS-Harris detector improves robustness to speckle noise due to its use of the first derivative instead of the second derivatives adopted in the LoG-Harris detector. In Figure 6(c), it is easy to verify the efficiency of the MBS-Harris detector in the image with a rectangle corrupt using speckle noise; keypoints are only detected on the corners without spurious detections.
In Figure 6, the LoG-Harris detection is applied on the amplitude and the logarithm images, respectively. With this approach, keypoints are indeed detected near corners in the amplitude image (see Figure 6(a)), but they are poorly located with fewer spurious detections on homogeneous areas and on edges. Although LoG-Harris applied on the logarithm image performs well in dealing with additive noise, it is not robust enough to speckle noise, and there are still some spurious detections (Figure 6(b)). By adjusting the parame-ters on the multiscale Harris criterion, the number of false detections can be reduced, but the number of correct ones will be also decreased.

Orientation Assignment and Descriptor Extraction via the GR Method.
In the original SIFT algorithm, histograms of gradient orientation are calculated in the neighborhood of each keypoint and weighted using the gradient magnitude and a Gaussian window. As a result, both orientation assignment and descriptor extraction are insensitive to speckle noise. In this paper, we calculate these histograms using the GR method, as introduced in Section 2.2. First, we build a histogram of gradient orientations in the neighborhood of each point and segment it to obtain reference directions. A set of 9 circular histograms of gradient orientations with respect to the reference direction is then built, as shown in Figure 7. In log-polar coordinates, these histograms correspond to 9 disjoint regions (a central region, 4 regions on ange of orientations is divided into 12 bins. We select the main modes of the local orientation histogram using the a contrario mode selection method mentioned in [30], and each keypoint is allowed to have at most two orientations. Instead of using a square neighborhood and 4 × 4 square sectors with 8 orientation bins as in the original SIFT descriptor, we rely on a circular neighborhood and 9 disjoint regions with 12 orientation bins. Again, the GR method is used to compute the orientations. This descriptor is obtained in a very similar way to the circular-SIFT descriptor, i.e., by connecting the orientation histogram corresponding to the log-polar coordinates. The only difference is the use of GR instead of the gradient by difference. We refer to this resulting descriptor as the ratio descriptor.

Experimental Validation of the MBS-SIFT Algorithm
In this section, we first introduce the MBS and the parameter settings of algorithms presented in this paper. After processing the MBS images, the LoG-Harris detector and MBS-Harris detector are compared and analyzed. Finally, we combine the LoG-Harris detector and MBS-Harris detector with SIFT and the ratio descriptors, respectively, and evaluate their performance using ROC curves.

Data Acquisition and Parameter
Settings. MBS images were acquired using Blueprint's Oculus M750d MBS, which is intended for near-field target identification. Acoustic image sequences for fish detection were selected to assess the method proposed in this paper. The sonar covers a region over 70°ðhorizontalÞ × 12°ðverticalÞ and allows for the collection of 512 beams with an operating frequency of 1.2 MHz. Its angular resolution is 0.6°, and the maximum update rate is 40 Hz. For the LoG-Harris detector, the scale-space representation started from a detection scale of 0.5 and the scale factor between two levels of resolution was set to r = 2 1/3 . We used 17 scale levels. For the multiscale Harris criterion, the parameter t was empirically set to 0.04 and the threshold r H was set to 4000 for 8-bit sonar images but was adapted for each MBS image to obtain the same number of keypoints.
For the MBS-Harris detector, the parameters chosen were as follows: the first scale was set to β 0 = 2; the ratio between the two scales was c = 2 1/3 ; the number of scales was m max = 8; the arbitrary parameter of the MBS-Harris criterion was set to d = 0:04. The remaining scales were then obtained by setting β m = β 0 ⋅ c m . After an experimental study of interest points in sonar images, the threshold r MH applied on the multiscale MBS-Harris function was set to 0.05.

Results and Analysis of Keypoint Detection.
We used 54 MBS image pairs to verify the proposed algorithm. Single frame and sequences of MBS images were used to compare the different keypoint detectors. In Figure 8, we see sonar images of a single fish target and a metal chain. As expected, using the LoG-Harris detector (Figures 8(a) and 8(c)), the keypoints detected are mainly in high-reflectivity areas, but many false detections occur in the background corrupt by speckle noise. However, the keypoints detected using the MBS-Harris detector (Figures 8(b) and 8(d)) are mostly located on bright points and corners of the fish target and the metal chain. The number of false detections is considerably lower than that of the LoG-Harris detector. Figure 9 presents the keypoint detection of two fish targets in a sonar image sequence using the LoG-Harris and MBS-Harris detectors. In Figures 9(a)-9(d), many false detections are obtained using the LoG-Harris detector due to the influence of speckle noise. In Figures 9(e)-9(h), we see that the MBS-Harris detector performs better and the keypoints are detected on the corners and the bright points of the two fish targets.

Matching Performance Evaluation.
Keypoints of two sonar images are matched according to their descriptors. One approach is to select for each descriptor in the first image the nearest neighbor in the second image based on the minimum Euclidean distance. However, many interest points from the first image will not have any correct match in the second image because they were the result of the background's speckle noise influence. To match the two images robustly, a more effective measure is obtained by comparing the distance of the closest neighbor to that of the secondclosest neighbor. This matching criterion is named the nearest neighbor distance ratio method [13]. First, for each descriptor in the first image, we select the nearest and second-nearest neighbor in the second image based on the Euclidean distance. To filter false matches, the distances to the closest and second-closest neighbors are then compared. A threshold th is applied on the ratio of those respective distances. This measure performs well because correct matches need to have the closest neighbor significantly closer than the closest incorrect match to achieve reliable matching. ROC curves were computed for different combinations of keypoint detectors and descriptors by varying th. The keypoints of two images were matched according to their where N is the total number of possible matches, N T is the number of true correct matches, and N CM and N FM are the numbers of correct and false matches, respectively, at a certain value of the threshold th.
Here, we compare two feature detectors, the LoG-Harris detector and the proposed MBS-Harris detector, and two feature descriptors, the proposed ratio descriptor and the SIFT descriptor. The ROC curves of the four considered situations were calculated for the 54 MBS image pairs and are shown in Figure 10. We see from the figure that the best performance was achieved by the combination of the MBS-Harris detector and the ratio descriptor. Nearly 60% of correct matches were obtained at a false rate of 1%. The corresponding rate of the MBS-Harris detector/SIFT descriptor combination was 48%. For the other two configurations, this rate was less than 40%. In conclusion, the MBS-Harris detector was more robust to speckle noise in sonar images than the LoG-Harris detector, and the ratio descriptor achieves better results than the SIFT descriptor.

Computational Efficiency.
To evaluate the computational efficiency of the proposed MBS-Harris detector and ratio descriptor, the 54 MBS image pairs collected using Blueprint's Oculus M750d MBS were used to test and compare our method with the other three methods (LoG-Harris detec-tor+SIFT descriptor, LoG-Harris detector+ratio descriptor, and MBS-Harris detector+SIFT descriptor). The MBS images' size was 129 × 166. The hardware and software resources (i.e., Intel® Core™ i5-7200U CPU @ 2.50 GHz, 12 GB RAM, HDD 500 GB, MATLAB 2016a, and Windows 10 64-bit OS) were the same for all the above algorithms. The average running times of the proposed approach and the other three methods are presented in Table 1. The proposed MBS-Harris detector and ratio descriptor method took significantly less time to complete the required task and are therefore more computationally efficient.

Conclusion
In this paper, we present the MBS-SIFT algorithm, which combines a keypoint detector with a feature descriptor adapted to MBS images. A new gradient calculation approach specifically tailored to MBS images, the GR method, is applied in several steps of the proposed algorithm to make the proposed algorithm robust to speckle noise. Based on the multiscale Harris detector, this new keypoint detector offers consistent keypoints of various targets in sonar images. The robust gradient magnitudes and orientations obtained using GR allow the implementation of an efficient feature descriptor for MBS images.
From the ROC curves, we observe that the MBS-SIFT algorithm achieves better results than the traditional SIFT algorithm. In single frame sonar images, the keypoints are mainly detected on the corners and bright points of the targets. In the continuous multiframe sonar images, the features of multiple objects are detected consistently. Thus, a better matching effect is obtained. In future work, we will apply the algorithm to target tracking and the measurement of target velocity in sonar images. In addition, sonar image matching under different conditions will be considered.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.