Underwater Target Tracking Using Forward-Looking Sonar for Autonomous Underwater Vehicles

In the scenario where autonomous underwater vehicles (AUVs) carry out tasks, it is necessary to reliably estimate underwater-moving-target positioning. While cameras often give low-precision visibility in a limited field of view, the forward-looking sonar is still an attractive method for underwater sensing, which is especially effective for long-range tracking. This paper describes an online processing framework based on forward-looking-sonar (FLS) images, and presents a novel tracking approach based on a Gaussian particle filter (GPF) to resolve persistent multiple-target tracking in cluttered environments. First, the character of acoustic-vision images is considered, and methods of median filtering and region-growing segmentation were modified to improve image-processing results. Second, a generalized regression neural network was adopted to evaluate multiple features of target regions, and a representation of feature subsets was created to improve tracking performance. Thus, an adaptive fusion strategy is introduced to integrate feature cues into the observation model, and the complete procedure of underwater target tracking based on GPF is displayed. Results obtained on a real acoustic-vision AUV platform during sea trials are shown and discussed. These showed that the proposed method is feasible and effective in tracking targets in complex underwater environments.


Introduction
After decades of research and development, autonomous underwater vehicles (AUVs) are becoming accepted by an increasing number of users in various military and civilian establishments. AUVs globally sold to customers are becoming progressively sophisticated through improvement of their self-governance capabilities, which allows them to deal with increasingly complex missions [1][2][3][4][5]. When AUVs move in unknown marine environments, a relative motion state appears between targets in the scene and AUVs. Thus, it is greatly significant for AUV autonomy to enhance moving-target prediction ability under complex dynamic backgrounds by using human perception [6][7][8].
As a particularity of underwater environments, acoustic vision is still a useful means of long-distance measurement for AUVs, so it is an important issue to understand the moving status of underwater targets on the basis of acoustic-vision information. At present, some significant achievements have been obtained. Williams [9,10] used temporal feature measures to provide a quantitative description of a moving target's behavior over several scans, which was verified by a diver tracking experiment. Furthermore, he discussed [11] a tracking method of underwater targets based on optical-flow theory, and a tracking tree was constructed storing tracking information to enhance robustness. Chantler [12] and Ruiz [13] presented different approaches for classification and obstacle

FLS Overview
The acoustic images were gained using Seaking DST Sonar, a type of FLS which is manufactured by Tritech [28]. The sonar is characterized by a fan-shaped beam that is rotated mechanically to create a spatial map of its surrounding area, and it produces a single ping at each angle and waits for the return before stepping to the following angle, continuing until the entire sector is scanned. Returns from each ping are then used to create the image, as is shown in Figure 1. It is the type of sonar most commonly used for collision avoidance, but also finds applications in mine detection and surveillance. Specifications of the sonar are shown in Table 1.  Acoustic images are formed by the echo intensity from the three-dimensional environmental space. Despite the wide-range advantage over standard vision, imaging sonar suffers from several drawbacks: (1) The number of transducers that can be packed in an array is physically restricted because of the limitations of transducer size. Thus, the resolution of an FLS image is lower, and the gray level of the target area is generally smaller, so it is more difficult to find some details of targets inside it. (2) The scattering capability of different parts of the target surface is different, which is affected by the shape, material, and relative position between target and sonar. The incident angle of an acoustic wave is also changed with target movement, so different regions may be generated for the same target in the acoustic image, and they often appear to be unconnected regions in acoustic vision. (3) The phenomenon of multipath propagation is a distinctive feature in acoustic images, and reflected acoustic waves may have greater energy than that of ones reflected from obstacles, leading to false or lack of target detection, increasing the difficulty of acoustic-image processing.
Sensors 2020, 20, 102 4 of 28 For the above, some images under different conditions are listed in Figure 2. It is shown that the characteristic of an acoustic image is different than those of optical images. Thus, some image-processing methods used in optical images have to be improved so that good results can be obtained. acoustic wave is also changed with target movement, so different regions may be generated for the same target in the acoustic image, and they often appear to be unconnected regions in acoustic vision. (3) The phenomenon of multipath propagation is a distinctive feature in acoustic images, and reflected acoustic waves may have greater energy than that of ones reflected from obstacles, leading to false or lack of target detection, increasing the difficulty of acoustic-image processing.
For the above, some images under different conditions are listed in Figure 2. It is shown that the characteristic of an acoustic image is different than those of optical images. Thus, some imageprocessing methods used in optical images have to be improved so that good results can be obtained.

Feature Selection Based on GRNN
The main goal of feature selection is to choose a number of features from the extracted feature set that yields minimum classification error. In this work, a feature-selection method based on a combination of GRNN and search procedures such as sequential forward selection (SFS) and sequential backward selection (SBS) was used to discover the optimal subset of features.

Feature Description
It was supposed that the minimum size of outer rectangle of was × , the number of pixels of which consists, the number of pixels of which the edge of consists, the number of pixels of which the background region consists, S the number of intensity levels in the image, h( , ) the element of second-order histogram H, ( ) the Euclidean distances from point on the target's perimeter curve to the target's centroid, and = 1,2, … , . Normalized central moments of ( , ) were defined to be: where = ( + + 2)/2 for + = 2,3, … , ∞ , = ∑ ∑ ( , )/∑ ∑ ( , ) , and = ∑ ∑ ( , )/∑ ∑ ( , ).
Some possible features [29] were considered in this paper, which are described in Table 2.

Feature Selection Based on GRNN
The main goal of feature selection is to choose a number of features from the extracted feature set that yields minimum classification error. In this work, a feature-selection method based on a combination of GRNN and search procedures such as sequential forward selection (SFS) and sequential backward selection (SBS) was used to discover the optimal subset of features.

Feature Description
It was supposed that the minimum size of outer rectangle of R k was m × n, N o the number of pixels of which R k consists, N o e the number of pixels of which the edge of R k consists, N b the number of pixels of which the background region consists, S the number of intensity levels in the image, h(i, j) the element of second-order histogram H, D o e (i) the Euclidean distances from point on the target's perimeter curve to the target's centroid, and i = 1, 2, . . . , N o e . Normalized central moments η pq of f (x, y) were defined to be: . Some possible features [29] were considered in this paper, which are described in Table 2.

No. Function
Inverse difference moment M co Variance M co Difference entropy M co

Search Procedure
The SBS method performs a greedy space-searching technique. Starting by measuring performance on the original (unchanged) dataset, it proceeds by measuring classification performance by using classifiers that are induced in the datasets in which a single feature is omitted. Finally, the least Sensors 2020, 20, 102 6 of 28 significant feature is detected as the one that caused the lowest drop or highest gain in classifier performance [30]. This feature is afterwards omitted from the dataset, and the procedure is recursively repeated until the minimal required number of features remains or a certain stopping criterion is reached. The SBS procedure is as shown in Table 3. Table 3. Sequential-forward-selection (SBS) procedure.

1
Start with the full set Y 0 = X 2 Remove the worst feature Go to 2 In contrast to SBS, SFS starts with an empty data set and proceeds by expanding the data set with the feature, of which addition to the data set boosts the wrapped model performance most. The algorithm adds features in such manner recursively until the stopping criteria is met [31]. The procedure of SFS is as Table 4. Table 4. Sequential-backward-selection (SFS) procedure.

1
Start with the empty set Y 0 = {∅} 2 Select the next best feature Go to 2

GRNN for Classification
The GRNN that was proposed by Specht is a class of neural networks extensively used for function mapping between input and output variables [32][33][34][35], which is shown in Figure 3. It is a one-pass learning algorithm with a highly parallel network, and it does not require an iterative procedure. Thus, it provides fast training, and estimates can converge to the underlying (linear or nonlinear) regression surface even with sparse samples, that is, even with sparse data in a multidimensional measurement space, GRNN provides smooth transitions from one observed value to another, hence, it can be used for predicting, modelling, mapping, and interpolating continuous variables. In contrast to SBS, SFS starts with an empty data set and proceeds by expanding the data set with the feature, of which addition to the data set boosts the wrapped model performance most. The algorithm adds features in such manner recursively until the stopping criteria is met [31]. The procedure of SFS is as Table 4. Table 4. Sequential-backward-selection (SFS) procedure.

GRNN for Classification
The GRNN that was proposed by Specht is a class of neural networks extensively used for function mapping between input and output variables [32][33][34][35], which is shown in Figure 3. It is a onepass learning algorithm with a highly parallel network, and it does not require an iterative procedure. Thus, it provides fast training, and estimates can converge to the underlying (linear or nonlinear) regression surface even with sparse samples, that is, even with sparse data in a multidimensional measurement space, GRNN provides smooth transitions from one observed value to another, hence, it can be used for predicting, modelling, mapping, and interpolating continuous variables. For the observed values of random variable , the regression of random variable can be found using:  For the observed values X of random variable x, the regression of random variable y can be found using: where f (X, y) is a known joint continuous probability-density function. When f (X, y) is unknown, it should be estimated from a set of observations of x and y. Probability estimatorf (X, y) can be gained by the nonparametric consistent estimator suggested by Parzen as follows:f where n is the number of observations, m the dimension of vector variable x, and σ the smoothing factor. X i and Y i are sample values of random variables x and y. Substituting Equation (3) into Equation (2), the outputŶ(X) can be written aŝ

Experiments and Analysis
Some experiments were carried out to obtain a representative subset of features in the tank, as shown in Figure 4. The targets consisted of a pontoon, a cube, a triangular prism, a reflector, and a sphere, which are shown in Figures 5 and 6. A series of FLS images under different situations were obtained, as shown in Table 5 Substituting Equation (3) into Equation (2), the output Y (X) can be written as

Experiments and Analysis
Some experiments were carried out to obtain a representative subset of features in the tank, as shown in Figure 4. The targets consisted of a pontoon, a cube, a triangular prism, a reflector, and a sphere, which are shown in Figures 5 and 6. A series of FLS images under different situations were obtained, as shown in Table 5. Results of feature selection obtained by SFS and SBS are shown in       Figure 6 shows that, if statistic rules could not be founded by SFS and SBS in each test, then the average values of standard deviation were counted, which are shown in Figure 7. For the classification method, it was shown that average values gained by SFS were smaller than those gained by SBS if the number of selected features was less than 12. For selected features, it was shown that average values declined with the increase of selected features if the number of selected features was less than 5. This indicated some useful description information drawn into the classification by new added features of the target, so the accuracy of target classification was improved. In contrast, as the number of selected features was more than 5, errors increased with the increase of selected features. This indicated some useless description information draw into the classification by new added features, which had more of an effect on target classification, so error rate was raised. From the results, it can be seen that it was not beneficial for target classification to select more features. According to the results, it may have been the best choice to select five types of features, and for SFS to obtain the smaller classification error rate.  On the basis of the above conclusions, the sets of features were selected by SFS, and the statistical  Only second target moves. 3 Only third target moves. 4 Only fourth target moves. 5 First and fourth targets move together in the same direction. 6 Second and fourth targets move together in the same direction. 7 Fourth and fifth targets move together in the same direction. 8 Third and fourth targets move together in the opposite direction, and their trajectory is crossed. 9 Third and fifth targets move together in the opposite direction, and their trajectory is crossed. 10 First and second targets move together in the opposite direction, and their trajectory is crossed. 11 Second and third targets move together in the opposite direction, and their trajectory is crossed. 12 Second, third, and fourth target moves together in the same direction. 13 Second, third, and fourth targets move together in the opposite direction. 14 Second target does not move.  On the basis of the above conclusions, the sets of features were selected by SFS, and the statistical results of feature order are shown in Figure 8. As only five types of features were used to set up the feature set, the feature order was divided into six intervals (shown in Table 6), and statistical results were rearranged, which are shown in Figure 9. Then, five types of features that had more occurrences were selected in interval B, and they are shown in Table 7.  Figure 6 shows that, if statistic rules could not be founded by SFS and SBS in each test, then the average values of standard deviation were counted, which are shown in Figure 7. For the classification method, it was shown that average values gained by SFS were smaller than those gained by SBS if the number of selected features was less than 12. For selected features, it was shown that average values declined with the increase of selected features if the number of selected features was less than 5. This indicated some useful description information drawn into the classification by new added features of the target, so the accuracy of target classification was improved. In contrast, as the number of selected features was more than 5, errors increased with the increase of selected features. This indicated some useless description information draw into the classification by new added features, which had more of an effect on target classification, so error rate was raised. From the results, it can be seen that it was not beneficial for target classification to select more features. According to the results, it may have been the best choice to select five types of features, and for SFS to obtain the smaller classification error rate.
On the basis of the above conclusions, the sets of features were selected by SFS, and the statistical results of feature order are shown in Figure 8. As only five types of features were used to set up the feature set, the feature order was divided into six intervals (shown in Table 6), and statistical results were rearranged, which are shown in Figure 9. Then, five types of features that had more occurrences were selected in interval B, and they are shown in Table 7.

Basic Principle
GPF is a problem for traditional particle-filter resampling, and the Gaussian density function is used to approximate the posterior probability distribution of the state [36,37]. The density of Gaussian random variable x can be expressed as: where x represents an m-dimensional vector, and x represents the mean of x. σ represents covariance. As observation value y t at time t is obtained, the posterior probability distribution is approximated as: where x t represents the state value at time t; y 0:t represents the set of observation sequence numbers from 0 to t, that is, y 0:t = y 0 , y 1 , . . . y t ; x t represents the mean of x t ; σ t represents the mean of σ; and C t is a normalized constant, expressed as follows: p(x t y 0:t−1 ) is prior probability distribution, and the GPF measurement update approximates the above prior probability distribution by Gaussian distribution N(x t ; x t , σ t ). Usually, the mean and covariance of p(x t y 0:t ) are obtained by extracting K samples x t,n (n = 1, 2, . . . , K) from importance function q(x t y 0:t ) .
Similarly, by approximating posterior probability distribution with Gaussian distribution function, the updated posterior probability distribution can be approximated as: As the measurement is updated, the GPF approximates predicted probability distribution p(x t+1 y 0:t ) to Gaussian distribution. Then: In the formula, particle x t,n is obtained by sampling N(x t ; x t , σ t ). On the basis of observations at time t, by sequentially sampling state-transition distribution p(x t+1 x t,n ) of n = 1, 2, . . . , K, state particle x t+1,n at time t + 1 is obtained. Then, x t+1 and σ t+1 are calculated by the following formula: Then, the predicted probability distribution of the GPF can be approximated as:

Gaussian Particle-Filter Improvement
Although the particle filter provides a good probabilistic framework for target tracking, the target region lacks some details, such as those in optical images, and the information of region area, brightness, and contour is also unsteady, so it is difficult to track a moving target on the basis of only single-feature information in FLS image sequences. Thus, the method based on a feature set was used in this paper.

Likelihood-Function Representation
For the feature set selected in Section 3.4, we supposed that the distribution of the ith feature at time t is expressed as S i t = S i t,n n=1,...,K , and the reference model of the ith feature is S i m . Then, the likelihood function based on the Gaussian model is written as [38]: where y i t is the measurement of the i feature clue, α is the likelihood-function noise value, β i is the distance control coefficient, and d S i t , S i m refers to the distance between target template feature and each particle feature.

Feature-Set Fusion Strategy
Adaptive fusion (AF) is proposed to fuse the likelihood functions formed by the feature set, which can adaptively adjust fusion strategy according to the tracking situation. As a feature clue is good, multiplicative fusion (MF) is selected to obtain the likelihood function with higher confidence. Otherwise, it is switched to weighted fusion (WF), then, a more stable likelihood function is obtained.
WF is more stable for the problem of feature fusion under the interference condition, and its expression is as follows: where a i is weighting coefficient of p(y i t x t ) and n i=1 a i = 1. On the basis of the independent assumptions of each feature, MF can achieve better tracking accuracy under less interference. The likelihood model for m feature multiplicative fusions is as follows: Considering the different advantages of WF and MF, the switch condition was set up on the basis of feature clues, which could be assessed by the covariance matrix. It was assumed that the dimension of x t is represented as dim, and the covariance of the ith feature is represented as A i , then, it is written as: and covariance matrix ∆ i is written as [39,40]: Threshold T i was set for each cue to determine whether the cue was degenerated. Then, the adaptive likelihood model could be written as: where a i is computed by the fuzzy logic method, and the algorithm is shown in Table 8. Table 8. Computation procedure of a i .

Algorithm of a i Calculation
1 Calculate value f else,i , which is written as: f else,i = n j=0 1/∆ j /m, j i 2 Design fuzzy controller to translate 1/∆ j and f else,i to fuzzy domain; fuzzy-rule table is shown in Table 9.  3 Input 1/∆ j and f else,i into the fuzzy controller, and obtain fuzzy output b i of i th feature. 4 Calculate weighting coefficients of each feature a i , which is written as:

Target-Tracking steps
According to GPF theory, tracking-implementation steps based on FLS images are summarized as: 1.
Initialization: to select interesting targets in first image frame. After the image is processed, target features in Table 7 are calculated, and the number of sample particles K is determined. It is assumed that the initial importance function is normal distribution function. Then, the mean value is the center coordinate x target , y target of the target, and covariance σ is determined by the tracking environment, that is, particles collected by the initial importance function in the x-and y-axes can be written as N x; x target , 45 , N y; y target , 40 , and each particle is calculated according to the kinematics model.

2.
To capture the image in the next frame, calculate features of particles x t,n K n=1 . According to Equation (17), feature clues are analyzed to check whether they are degenerated, and the fused weighted value of particles is calculated. The weighted particle value is normalized as w t,n = w t,n / K n=1 w t,n ; then, µ t and σ t are calculated.

3.
To sample according to posterior probability distribution N(x t ; µ t , σ t ), and x t,n K n=1 is gained. Then, x t+1,n can be calculated by the kinematics model. According to Equation (18), the predicted mean and covariance values are calculated. If targets are lost, covariance value is expanded, otherwise, it is turned into Step 2.

Example Test and Discussion
In order to evaluate the tracking method proposed in the paper, a series of tests were carried out in the tank and in the sea. In the tank experiment, it was compared with other methods in different motion scenarios, and its advantages were assessed. In the sea test, the method was downloaded to the AUV system, and its adaptability was assessed. Center position error (CLE) was used to calculate the error that was the Euclidean distance between tracked center position x p , y p and real center position x g , y g . Its formula is:

Tank Experiment
In the experiment, the moving target consisted of three types of targets (shown in Figure 10). A trailer was selected as the platform on which FLS was fixed. Due to the limitation of the tank length, trailer and FLS remained static during the whole experiment, and the targets are dragged by ropes on both sides of the tank. Parts of test scene are shown in Figure 11. For each image sequence, the number of particles was set to 300 in per frame, that is, 300 candidates were collected around the position of the target in the previous frame. The same image-processing algorithms were used to compare the accuracy gained by the proposed algorithm with ones gained by other algorithms, and the parameters in the image-processing algorithm were set as the same value.

Comparative Experiments of Tracking Methods
In order to analyze tracking performance, tracking experiments of a single target were carried out (shown in Figure 12), and results are shown in Figure 13. Figure 12 shows that the moving target was close to the FLS from far and near, and the target region was quite changeable, but the proposed method could effectively track the target. During the entire tracking process, some influence existed, such as fluid resistance and the drag speed of the rope, so the moving direction of the target often suddenly changed. Then, its trajectory did not obviously appear with regular motion, as shown in Figure 13a, but target can be tracked by each method. In comparison with each other, the tracking trajectory obtained by the proposed method was closer to the real trajectory. The EKF is the approximation of the nonlinear non-Gaussian motion state. Its tracking accuracy is sensitive to the target-motion state, and cumulative error appears in the tracking process, so the CLE gained by EKF was bigger than that gained by other methods, and it had a trend of divergence, as shown in Figure 13b. Instead, PF and the proposed method are nonlinear filtering methods based on Monte Carlo simulations, so the CLE gained by PF and proposed method were in a small stable interval, and tracking results were more accurate. For the proposed method, it was not necessary to input strong prior knowledge into the state equation and measurement equation, so variance of the target movement had less influence on the tracking result. Thus, the CLE gained by the proposed method was smaller, and tracking was more accurate.

Comparative Experiments of Tracking Methods
In order to analyze tracking performance, tracking experiments of a single target were carried out (shown in Figure 12), and results are shown in Figure 13. Figure 12 shows that the moving target was close to the FLS from far and near, and the target region was quite changeable, but the proposed method could effectively track the target. During the entire tracking process, some influence existed, such as fluid resistance and the drag speed of the rope, so the moving direction of the target often suddenly changed. Then, its trajectory did not obviously appear with regular motion, as shown in Figure 13a, but target can be tracked by each method. In comparison with each other, the tracking trajectory obtained by the proposed method was closer to the real trajectory. The EKF is the approximation of the nonlinear non-Gaussian motion state. Its tracking accuracy is sensitive to the target-motion state, and cumulative error appears in the tracking process, so the CLE gained by EKF was bigger than that gained by other methods, and it had a trend of divergence, as shown in Figure 13b. Instead, PF and the proposed method are nonlinear filtering methods based on Monte Carlo simulations, so the CLE gained by PF and proposed method were in a small stable interval, and tracking results were more accurate. For the proposed method, it was not necessary to input strong prior knowledge into the state equation and measurement equation, so variance of the target movement had less influence on the tracking result. Thus, the CLE gained by the proposed method was smaller, and tracking was more accurate.

Comparative Experiments of Tracking Methods
In order to analyze tracking performance, tracking experiments of a single target were carried out (shown in Figure 12), and results are shown in Figure 13. Figure 12 shows that the moving target was close to the FLS from far and near, and the target region was quite changeable, but the proposed method could effectively track the target. During the entire tracking process, some influence existed, such as fluid resistance and the drag speed of the rope, so the moving direction of the target often suddenly changed. Then, its trajectory did not obviously appear with regular motion, as shown in Figure 13a, but target can be tracked by each method. In comparison with each other, the tracking trajectory obtained by the proposed method was closer to the real trajectory. The EKF is the approximation of the nonlinear non-Gaussian motion state. Its tracking accuracy is sensitive to the target-motion state, and cumulative error appears in the tracking process, so the CLE gained by EKF was bigger than that gained by other methods, and it had a trend of divergence, as shown in Figure 13b. Instead, PF and the proposed method are nonlinear filtering methods based on Monte Carlo simulations, so the CLE gained by PF and proposed method were in a small stable interval, and tracking results were more accurate. For the proposed method, it was not necessary to input strong prior knowledge into the state equation and measurement equation, so variance of the target movement had less influence on the tracking result. Thus, the CLE gained by the proposed method was smaller, and tracking was more accurate.

Fusion-Strategy Experiments
In the experiments, targets kept moving along different paths of motion, and only parts of the results gained under the existence of crossing and noncrossing trajectories are shown due to the limitations of this paper. Figure 14 shows that targets moved in the same direction and they were close to the FLS from far and near. In the whole moving phase of the targets, targets could be caught by three fusion methods. In Figure 15, it is shown that target trajectories had unsteady fluctuation, which led to the larger tracking deviation gained by MF. In this situation, the fusion algorithm was selected by feature analysis in the proposed method. As feature clues were degenerated, WF was used to calculate the likelihood function, so trajectories gained were closer to those gained by WF. In Figure 16, it shown that all CLE curves were affected by the unsteady motion of targets, and they had the same trend. By contrast, the CLE curve of the first target gained by the proposed method was almost coincident with that gained by WF, but the CLE of the second target gained by the proposed method is smallest, showing that the proposed method could track targets more accurately.

Fusion-Strategy Experiments
In the experiments, targets kept moving along different paths of motion, and only parts of the results gained under the existence of crossing and noncrossing trajectories are shown due to the limitations of this paper. Figure 14 shows that targets moved in the same direction and they were close to the FLS from far and near. In the whole moving phase of the targets, targets could be caught by three fusion methods. In Figure 15, it is shown that target trajectories had unsteady fluctuation, which led to the larger tracking deviation gained by MF. In this situation, the fusion algorithm was selected by feature analysis in the proposed method. As feature clues were degenerated, WF was used to calculate the likelihood function, so trajectories gained were closer to those gained by WF. In Figure 16, it shown that all CLE curves were affected by the unsteady motion of targets, and they had the same trend. By contrast, the CLE curve of the first target gained by the proposed method was almost coincident with that gained by WF, but the CLE of the second target gained by the proposed method is smallest, showing that the proposed method could track targets more accurately.

Fusion-Strategy Experiments
In the experiments, targets kept moving along different paths of motion, and only parts of the results gained under the existence of crossing and noncrossing trajectories are shown due to the limitations of this paper. Figure 14 shows that targets moved in the same direction and they were close to the FLS from far and near. In the whole moving phase of the targets, targets could be caught by three fusion methods. In Figure 15, it is shown that target trajectories had unsteady fluctuation, which led to the larger tracking deviation gained by MF. In this situation, the fusion algorithm was selected by feature analysis in the proposed method. As feature clues were degenerated, WF was used to calculate the likelihood function, so trajectories gained were closer to those gained by WF. In Figure 16, it shown that all CLE curves were affected by the unsteady motion of targets, and they had the same trend. By contrast, the CLE curve of the first target gained by the proposed method was almost coincident with that gained by WF, but the CLE of the second target gained by the proposed method is smallest, showing that the proposed method could track targets more accurately.      Figure 17 shows that targets moved in the opposite direction, and their trajectories intersect. Targets could be caught by three fusion methods before they met each other. After they left the intersection point, not all targets could be caught by MF. Using WF, the first target could be caught, but the second target was lost. By contrast, the proposed method was successful in consecutively tracking targets. In Figure 18, it is shown that all predicted target trajectories were close to the real trajectory before they met each other. By contrast, trajectory deviation gained by MF is larger, but they are similar with those gained by WF and proposed method. As targets met each other, all predicted trajectories were affected. After they left the intersection point, the predicted trajectory gained by the MF gradually strayed away from the real target position, which indicated that target tracking was a failure. Results gained by WF showed that the tracking of the second target was a failure. Although the first target was caught, its predicted trajectory wildly fluctuated, which led to a decrease in tracking accuracy. However, predicted trajectories gained by proposed method shortly fluctuated, which were in the controlled range; thus, target tracking remained continuous. In Figure  19, it is shown that, in comparison with the CLE divergence gained by other methods, the CLE gained by the proposed method remained a low stable during the tracking process, so the proposed method was more robust and it could maintain the smoothness of the tracking curve faster in the interference environment.
(a)  Figure 17 shows that targets moved in the opposite direction, and their trajectories intersect. Targets could be caught by three fusion methods before they met each other. After they left the intersection point, not all targets could be caught by MF. Using WF, the first target could be caught, but the second target was lost. By contrast, the proposed method was successful in consecutively tracking targets. In Figure 18, it is shown that all predicted target trajectories were close to the real trajectory before they met each other. By contrast, trajectory deviation gained by MF is larger, but they are similar with those gained by WF and proposed method. As targets met each other, all predicted trajectories were affected. After they left the intersection point, the predicted trajectory gained by the MF gradually strayed away from the real target position, which indicated that target tracking was a failure. Results gained by WF showed that the tracking of the second target was a failure. Although the first target was caught, its predicted trajectory wildly fluctuated, which led to a decrease in tracking accuracy. However, predicted trajectories gained by proposed method shortly fluctuated, which were in the controlled range; thus, target tracking remained continuous. In Figure 19, it is shown that, in comparison with the CLE divergence gained by other methods, the CLE gained by the proposed method remained a low stable during the tracking process, so the proposed method was more robust and it could maintain the smoothness of the tracking curve faster in the interference environment.  Figure 17 shows that targets moved in the opposite direction, and their trajectories intersect. Targets could be caught by three fusion methods before they met each other. After they left the intersection point, not all targets could be caught by MF. Using WF, the first target could be caught, but the second target was lost. By contrast, the proposed method was successful in consecutively tracking targets. In Figure 18, it is shown that all predicted target trajectories were close to the real trajectory before they met each other. By contrast, trajectory deviation gained by MF is larger, but they are similar with those gained by WF and proposed method. As targets met each other, all predicted trajectories were affected. After they left the intersection point, the predicted trajectory gained by the MF gradually strayed away from the real target position, which indicated that target tracking was a failure. Results gained by WF showed that the tracking of the second target was a failure. Although the first target was caught, its predicted trajectory wildly fluctuated, which led to a decrease in tracking accuracy. However, predicted trajectories gained by proposed method shortly fluctuated, which were in the controlled range; thus, target tracking remained continuous. In Figure  19, it is shown that, in comparison with the CLE divergence gained by other methods, the CLE gained by the proposed method remained a low stable during the tracking process, so the proposed method was more robust and it could maintain the smoothness of the tracking curve faster in the interference environment. (a)

Sea Trial
To further evaluate the proposed method, a series of trials were carried out in the sea, where depth of water was 10 m. An AUV named cShark was used as the moving platform, which was developed by Harbin Engineering University. cShark is about 5.5 m long, 0.8 m wide, and the redundancy of its actuators provides important functionalities, such as accurate perception and fine motion. Target size was less than 1 ×1 m, and they were located at 3 m underwater. Float balls were mounted on top of the targets, and ballasts were fixed on the bottom of targets, then, targets were levitated in the water. Targets were dragged by ropes and current, and their velocities were about 0.5-1 m/s. The AUV was kept running at the same depth as the targets, and the moving targets were tracked online by FLS. AUV speed was about 0.5 m/s. The sea-trial scene is shown in Figure 20.

Acoustic-Vision-Based Processing Framework
The hardware architecture comprises two parts that are shown in Figure 21a. One is an acoustic signal-processing computer, which is where the sonar-controller software and acoustic-imageprocessing software are run, and it passes the predicted measurements to the controller computer through a high-speed internal network. The second part is an FLS, which was facing front, and its detection range was set to 50 m. On the basis of Marr visual theory, software architecture was

Sea Trial
To further evaluate the proposed method, a series of trials were carried out in the sea, where depth of water was 10 m. An AUV named cShark was used as the moving platform, which was developed by Harbin Engineering University. cShark is about 5.5 m long, 0.8 m wide, and the redundancy of its actuators provides important functionalities, such as accurate perception and fine motion. Target size was less than 1 ×1 m, and they were located at 3 m underwater. Float balls were mounted on top of the targets, and ballasts were fixed on the bottom of targets, then, targets were levitated in the water. Targets were dragged by ropes and current, and their velocities were about 0.5-1 m/s. The AUV was kept running at the same depth as the targets, and the moving targets were tracked online by FLS. AUV speed was about 0.5 m/s. The sea-trial scene is shown in Figure 20.

Sea Trial
To further evaluate the proposed method, a series of trials were carried out in the sea, where depth of water was 10 m. An AUV named cShark was used as the moving platform, which was developed by Harbin Engineering University. cShark is about 5.5 m long, 0.8 m wide, and the redundancy of its actuators provides important functionalities, such as accurate perception and fine motion. Target size was less than 1 ×1 m, and they were located at 3 m underwater. Float balls were mounted on top of the targets, and ballasts were fixed on the bottom of targets, then, targets were levitated in the water. Targets were dragged by ropes and current, and their velocities were about 0.5-1 m/s. The AUV was kept running at the same depth as the targets, and the moving targets were tracked online by FLS. AUV speed was about 0.5 m/s. The sea-trial scene is shown in Figure 20.

Acoustic-Vision-Based Processing Framework
The hardware architecture comprises two parts that are shown in Figure 21a. One is an acoustic signal-processing computer, which is where the sonar-controller software and acoustic-imageprocessing software are run, and it passes the predicted measurements to the controller computer through a high-speed internal network. The second part is an FLS, which was facing front, and its detection range was set to 50 m. On the basis of Marr visual theory, software architecture was

Acoustic-Vision-Based Processing Framework
The hardware architecture comprises two parts that are shown in Figure 21a. One is an acoustic signal-processing computer, which is where the sonar-controller software and acoustic-image-processing software are run, and it passes the predicted measurements to the controller computer through a high-speed internal network. The second part is an FLS, which was facing front, and its detection range was set to 50 m. On the basis of Marr visual theory, software architecture was developed in the C language and included two parts, the middle-and high-level layers (shown in Figure 21b). The middle layer is for image preprocessing, such as image-data-interpolation processing, and acoustic images are formed on the basis of echo data collected at different times. The high-level layer is the ultimate implementation part. Acoustic images are processed, and target regions are gained. The possible region is predicted by the GPF, and the number of particles was set to 200. Results are submitted to the control system for planning the AUV navigation route, and are also used to determine the image-processing region in the next frame. developed in the C language and included two parts, the middle-and high-level layers (shown in Figure 21b). The middle layer is for image preprocessing, such as image-data-interpolation processing, and acoustic images are formed on the basis of echo data collected at different times. The high-level layer is the ultimate implementation part. Acoustic images are processed, and target regions are gained. The possible region is predicted by the GPF, and the number of particles was set to 200. Results are submitted to the control system for planning the AUV navigation route, and are also used to determine the image-processing region in the next frame.

Target-Tracking Test under Noncrossing-Movement Condition
In Figure 22, it is shown that targets move in the same direction, and the relative position varied with the movement of targets and the AUV. Then, reflection surfaces were changeable, so target regions in the FLS images were obviously gradually different. In the whole moving phase of the targets, the real trajectory was not smooth, and a situation of sudden change existed. Despite all this, continuous and stable target tracking was maintaining from the beginning by the proposed method. Figure 23 shows that rope and current disturbance were more serious those that in the tank experiment, so variation of real trajectories was sharper, and it is seemed that they sometimes moved

Target-Tracking Test under Noncrossing-Movement Condition
In Figure 22, it is shown that targets move in the same direction, and the relative position varied with the movement of targets and the AUV. Then, reflection surfaces were changeable, so target regions in the FLS images were obviously gradually different. In the whole moving phase of the targets, the real trajectory was not smooth, and a situation of sudden change existed. Despite all this, continuous and stable target tracking was maintaining from the beginning by the proposed method. by leaps and bounds. However, targets were still caught by proposed method, which maintained stable target tracking, and gained trajectories were close to the real ones. Figure 24 shows that, because of the influence of the current and AUV movement in the sea test, tracking error was larger than that obtained in the tank experiment, so CLE curves swung significantly and violently. In general, most CLEs obtained by the proposed method remained lower, which indicates that the method could be used to maintain the target tracking under an unstable condition of target movement.   Figure 23 shows that rope and current disturbance were more serious those that in the tank experiment, so variation of real trajectories was sharper, and it is seemed that they sometimes moved by leaps and bounds. However, targets were still caught by proposed method, which maintained stable target tracking, and gained trajectories were close to the real ones.   Figure 24 shows that, because of the influence of the current and AUV movement in the sea test, tracking error was larger than that obtained in the tank experiment, so CLE curves swung significantly and violently. In general, most CLEs obtained by the proposed method remained lower, which indicates that the method could be used to maintain the target tracking under an unstable condition of target movement.

Target-Tracking Test under Crossing-Movement Condition
In the sea trial, it was hard to make more than two targets move under the existence of a crossing path. Therefore, tracking problems of two kinds of targets are still considered of which trajectories intersect. Results are shown in Figures 25-27.
In Figure 25, targets are shown to move in different directions, and some features such as area, length, and shape obviously changed with the movement. In Figure 26, it shown that target trajectories were not smooth. As they met each other, the predicted trajectories were more cluttered for interference. The proposed method was not affected by them, and it could accurately lock onto the target position, so tracking status remained continuous and steady. Figure 27a shows that the real trajectory of the second target often suddenly changed, of which the variable range is larger. As a result, tracking accuracy decreased, and the CLE curves of the second target swung significantly and violently. Overall, however, the average CLE values of the first and second targets were about 1 m. Figure 27b shows that some abrupt change points existed because of the influence of target movement in leaps and bounds. In general, most target CLE remained lower most of the time, which indicates that the method is effective for target tracking. (a)

Target-Tracking Test under Crossing-Movement Condition
In the sea trial, it was hard to make more than two targets move under the existence of a crossing path. Therefore, tracking problems of two kinds of targets are still considered of which trajectories intersect. Results are shown in Figures 25-27.
In Figure 25, targets are shown to move in different directions, and some features such as area, length, and shape obviously changed with the movement. In Figure 26, it shown that target trajectories were not smooth. As they met each other, the predicted trajectories were more cluttered for interference. The proposed method was not affected by them, and it could accurately lock onto the target position, so tracking status remained continuous and steady. Figure 27a shows that the real trajectory of the second target often suddenly changed, of which the variable range is larger. As a result, tracking accuracy decreased, and the CLE curves of the second target swung significantly and violently. Overall, however, the average CLE values of the first and second targets were about 1 m. Figure 27b shows that some abrupt change points existed because of the influence of target movement in leaps and bounds. In general, most target CLE remained lower most of the time, which indicates that the method is effective for target tracking.

Target-Tracking Test under Crossing-Movement Condition
In the sea trial, it was hard to make more than two targets move under the existence of a crossing path. Therefore, tracking problems of two kinds of targets are still considered of which trajectories intersect. Results are shown in Figures 25-27.
In Figure 25, targets are shown to move in different directions, and some features such as area, length, and shape obviously changed with the movement. In Figure 26, it shown that target trajectories were not smooth. As they met each other, the predicted trajectories were more cluttered for interference. The proposed method was not affected by them, and it could accurately lock onto the target position, so tracking status remained continuous and steady. Figure 27a shows that the real trajectory of the second target often suddenly changed, of which the variable range is larger. As a result, tracking accuracy decreased, and the CLE curves of the second target swung significantly and violently. Overall, however, the average CLE values of the first and second targets were about 1 m. Figure 27b shows that some abrupt change points existed because of the influence of target movement in leaps and bounds. In general, most target CLE remained lower most of the time, which indicates that the method is effective for target tracking. (a)

Conclusions
In this paper, we proposed an AUV underwater-target-tracking framework based on acoustic images. An acoustic image received from an imaging sonar is unstable due to ultrasonic waves. Hence, it is difficult to continuously detect and track targets. To solve this problem, a GRNN was designed to select target features, and the effectiveness of the feature candidates in a series of images was evaluated. Furthermore, an adaptive fusion was used to establish the observation model, and the improved GPF was adopted to track moving targets. The tank and sea tests illustrated that this method is flexible in tracking moving targets in cluttered unknown environments, and it can solve the target-tracking problem under the crossed-path condition. The next stage of this work is to use the method presented in [41] and [42] to enhance the proposed fusion approach and classification results, and to apply this algorithm in more complicated ocean environments, where time-variable ocean currents and dynamic targets exist.