A Ship Target Detection and Tracking Algorithm Based on Graph Matching

This paper proposes a target detection and tracking system based on Hungarian graph matching algorithm and Kalman filter. Specifically, the algorithm converts continuous frame images of a detection network such as YOLOv3 into graph structure data as input information, and expresses the position information between the target and the target as points and lines in the figure. Then the Kalman filter is used to predict the position of the target in the next frame, and then the largest match between the picture constructed by the current frame and the picture constructed by the next frame is determined by the metric method of the fusion of the apparent feature and the motion feature, to achieve the purpose of tracking the target. For ships doing nonlinear motion, this paper proposes to improve the traditional Kalman filter system to an unscented Kalman filter system based on UT transformation. Experimental data shows that this method effectively improves the tracking effect of ships doing nonlinear motion.


Introduction
Target detection and tracking is one of the research hotspots in the field of computer vision. With the development of target detection related technologies, the method of tracking by detection [1] has become the main method of multi-target tracking. The primary problem to be solved in multi-target tracking is how to correlate inter-frame information. Traditional data association methods include Joint Probabilistic Data Association Filter (JPDAF) [2] and Multiple Hypothesis Tracking (MHT) [3]. JPDAF sets the possibility of association according to the degree of association between each measure, and generates a single hypothetical state by weighting each measure. MHT tracks all possible hypotheses, but too complex data needs to be pruned to be computationally processable. The above two methods are both applicable in detection-based tracking scenarios and have good results, but these methods require higher processor performance and increased computational and implementation complexity. This paper adopts a simpler and more efficient calculation framework. The framework uses the Hungarian image matching algorithm [4] and adopts a measure of the appearance and geometric fusion of bounding box overlap, Kalman filtering operation on the image space and setting of frame-by-frame data association and cascade matching methods. This method can achieve good detection performance while maintaining a high frame rate. In view of the lack of robustness of traditional Kalman filter in non-linear scenes, the unscented Kalman filter is used to replace the Kalman filter module to predict the position of the target in the future frame. Finally, a multi-feature fusion targets tracking method based on Hungarian graph matching algorithm and unscented Kalman filter [5] is formed.

Graph Matching Theory
Graph is represented as a ternary ordered array , , , which is a non-empty set composed of the vertices of the graph, is the set of edges that do not intersect with , is the correlation function between the edges in and a pair of disordered vertices. For any graph , there is a matrix called the adjacency matrix of . The adjacency matrix is expressed as , where is the number of edges connecting vertices and (0 if there is no connection). Define a subset in the graph . If all its elements are edges and the elements in are not adjacent to each other, then is said to be a match of . If there is no matching ′ in , let | | | |, then is said to be the maximum match of . One of the easiest ways to find the maximum matching is to find out all possible matching methods first, and then find the one with the most successful matching, but the time complexity of this algorithm is very high, so a more effective method is needed.
If is a route connecting two unmatched vertices in graph , and the edges belonging to and edges not belonging to (that is, the edges that have been matched and the edges to be matched) appear alternately on , then is called an augmentation path of . The Hungarian algorithm is a combinatorial optimization algorithm that solves the task assignment problem in polynomial time. It is characterized by constantly looking for an augmentation path. If the augmentation path cannot be found, it means that the maximum match has been reached.

Kalman Filter Algorithm
The original formula of the Kalman filter framework uses , , , ℎ,˙,˙,˙, ℎ these 8 parameters to describe the motion state of the target, where , is the coordinate of the center of the bounding box of the target, is the aspect ratio of the bounding box, and ℎ is the height of the bounding box. The following four variables ˙,˙,˙, ℎ represent the corresponding speed information of the above four values in the coordinate system. In the initialization phase, these speed information are usually set to 0. A Kalman filter can be regarded as a combination of two parts: the prediction part and the correction part. In the prediction phase, the Kalman filter predicts the target position of the current frame based on the result of the state estimation given in the previous frame; in the correction stage, Kalman uses the detected real value of the current frame state to correct the predicted value obtained in the previous prediction stage, and thus an estimated value closer to the true result is obtained. The prediction process of the Kalman filter is given as (1) Where represents the true value currently detected, ^ represents the estimated value of the Kalman filter, is the covariance matrix of the estimated error, ^ represents the Kalman filter's response to the next frame The predicted value of the target position, ' is the prediction error covariance matrix. Then use (2) to correct the Kalman filter Where is the Kalman gain, and ^ is the margin of the measurement. Update the covariance estimation matrix by (3)

UT Transformation
The method of UT transformation is to first select a few samples, that is, some Sigma points, and then perform a nonlinear transformation on it to simulate the state space of the predicted target, and after a series of operations, the probability density function distribution of the simulator and the posterior mean sum variance.
First select 2 1 Sigma points, denoted as , and the corresponding weight is , computed as is the th row or column of the square root of ; represents the scale factor, which can be used to indicate the difference between the sampling point and the range of mean point; represents the adaptive weight value of the corresponding sample point. After the sampled points undergo nonlinear transformation, the transformed sample point set is obtained, as (7) ， 0,1, … ,2 The point set obtained after nonlinear transformation can be regarded as the distribution of , and then the weighting parameter calculation can be performed to approximate the mean and variance of the output sample points, corresponding to the first-order and second-order statistical characteristics (8) and (9) Bringing each Sigma point into the above distributed and statistical characteristic formula can analyze the accuracy of UT transformation. When selecting sample points, a common strategy is to select sample points in a symmetrical sampling manner.

Graph-based ship target detection and tracking algorithm
In this section, a target tracking method that uses the Hungarian graph matching algorithm and Kalman filter state prediction is proposed. Combining the characteristics of the target, a cascaded matching measurement mechanism that combines appearance and motion features is designed, then proposed an improved Kalman filter algorithm according to actual application scenarios.

Graph matching algorithm based on multi-metric fusion
The traditional method for the correlation between the estimated state obtained by Kalman prediction and the moving target obtained by detection in the next frame is to use a matching method, and we still use this method here.
Using the Hungarian graph matching algorithm, the overall idea of the matching algorithm combines the two kinds of information between the detection frame and the tracking trajectory, which are appearance information and motion information. The detection result and the tracking trajectory can be successfully calculated by means of multi-metric fusion [6]. The degree of matching. In terms of motion information, Mahalanobis distance is used to describe the degree of match between the detection result and the tracking result at the motion level, as (10) , (10) Where represents the position coordinates of the th detection frame in the coordinate system, represents the value in the coordinate system of the target predicted by the th tracker, and represents the covariance matrix between the detection position and the average tracking position.
From the above formula, we can see that the Mahalanobis distance is actually calculating the standard deviation between the position of the detected target and the average position of the predicted result, quantifying the instability caused by the estimation of the target. In addition, the algorithm uses , , with a confidence interval of 0.95 to the Mahalanobis distance sets the threshold range. If a pair of matched Mahalanobis distances is less than the set threshold , the state of the moving target is regarded as a successful match. The corresponding value of in the fourdimensional space is 9.4877. When the trajectory of the target is clear and clear, Mahalanobis distance can be used as a matching metric. However, the use of Kalman filter algorithm to estimate the trajectory of the target can only get a preliminary rough result. When the camera is set on a mobile platform, the motion of the camera introduces a rapid displacement motion into the image level, making the Mahalanobis distance a measure of very large uncertainty.
Based on this consideration, the second measurement method is introduced into the matching problem. For each detection frame , a feature vector describing the appearance is calculated. For each tracking target, we build a feature library for it. In this library the feature vectors of the latest 100 frames that can successfully match the current target are stored. The second measurement method is to calculate the minimum cosine distance between the feature set of 100 targets that can successfully match the th trajectory and the feature vector of the th detection result of the current frame, computed as (11) , 1 | ∈ Similarly, we once again introduce a binary variable , , to characterize whether it is allowed According to this metric, the two are matched.
The above two metrics act on different aspects, thus forming complementarity. The Mahalanobis distance provides the possible position information of the target on the motion level. This information is very useful for short-term prediction; the appearance information considered by the minimum cosine distance is that the target disappears from the field of view and reappears in a long period of time. The possibility of successful matching of the feature information. In the final graph matching algorithm, we use the simple linear weighting of the two metrics as the final metric, as (12) , , 1 , (12) It is worth noting that the fusion of metrics can only be performed when the two indicators meet their respective threshold criteria, regardless of whether the value of is 0 or 1.

Cascade matching algorithm
The use of Kalman filter to predict the position of the target has a certain robustness in a short period of time, but in the case where a target is blocked for a long period of time and then appears in the field of view, the follow-up Kalman filter is added the instability of trajectory prediction will greatly reduce the value of the estimated state obtained through Kalman's prediction. For a long trajectory, because no new position information is obtained for a long period of time, the uncertainty of trajectory prediction is greatly increased. At this time, if there are two predicted trajectories that try to match the same detection result at the same time, the trajectory with a longer occlusion time has a greater matching weight, which makes the detected target more likely to match this trajectory.
In order to eliminate this influence, we have added a cascade matching module [7]. The core of cascade matching is to give priority to objects that are seen more frequently for trajectories with the same disappearance time, and add the concept of probability diffusion to the possibility of association. The formula is shown in formula (13) 1 (13)

Matching tracking algorithm based on unscented Kalman filter
The Kalman filter algorithm has a wide range of applications. It has a better target position prediction function when the target is doing simple motion, but its disadvantage is that the filter algorithm is an optimal estimation method for a linear environment. Non-linearly moving target ships will have tracking errors. In view of this situation, consider using the unscented Kalman filter algorithm based on UT transform extension. The unscented Kalman filter algorithm is mainly composed of two parts: UT transformation and Kalman filter algorithm[8], so it is divided into two major stages, and the time update of the target state and the observation update of the target state. The first step in the time update of the target state is to select 2 1 Sigma points of the target state, and then obtain the posterior mean and variance through UT transformation. The observation update of the target state is similar to the state update method in the Kalman filter algorithm, but the specific formula is different.
We assume that the state noise and process noise introduced into the system are both Gaussian white noise and both are additive noise, and that the process noise of the system is not correlated with the observed noise. Based on the above premise, the flow of the unscented Kalman filter algorithm is as follows: First initialize the state variable and the error covariance matrix, as formula (14)

Experiments
This experiment uses the Unscented Kalman Filter (UKF) in the previous section and the matching tracking algorithm based on multi-feature fusion proposed in the previous chapter, and proposes a multifeature and UKF fusion matching and tracking algorithm, and uses the verification set to compare the improved algorithm robustness and real-time changes. In terms of tracking algorithm initialization, both the original algorithm and the improved algorithm use the YOLOv3 detection network to initialize the location coordinates of each ship in the video. Under the constraints of 0 and 30 , the detection threshold is set to a confidence value of 0.3.
First, quantitative evaluation is used to measure the quality of the tracking effect, which is indicated with reference to the commonly used indicators, which are the success rate of tracking, the area under the success rate curve, and the tracking speed. It can be seen from the experimental results that the matching tracking algorithm proposed in this paper using multi-feature and UKF fusion is more robust than the original algorithm, but the FPS of the corresponding improved algorithm is slightly reduced, which can still meet the real-time requirements in practical applications.
At the same time, a qualitative analysis of the tracking effect of the UKF and multi-feature fusion algorithm is carried out using the example visible light ship video clip. The video to be detected has large background clutter and camera shake. The detection screenshots of the near and distant video clips are respectively selected for analysis, and the detection effect under the background of extreme weather conditions and the detection effect of ships entering from far to near are tested. Figure 4(a) shows the tracking effect when the camera is close to the target, Figure 4 Figure.2 Example video tracking effects under different viewing angles and different weather conditions It can be seen from the video that the YOLOv3 network has a high target detection rate, and only a few targets that do not have typical ship characteristics cannot be identified. Vessels often occlude each other. After occlusion, part of the trajectory of the ship can match the trajectory before the occlusion, but some targets that have been occluded for a long time will be recognized as new targets when they reappear. Through experiments, it can be seen that the current image matching target detection and tracking network can complete the multi-target tracking task under the sea background, and can better deal with the problems of object occlusion and high number of label switching.

Conclusion
This article mainly introduces a ship target tracking method based on Kalman filter algorithm and graph matching algorithm for practical application scenarios. This paper adopts a target tracking algorithm that uses the Hungarian graph matching algorithm, Kalman filter and cascade matching method. This algorithm designs a matching measurement mechanism that integrates appearance and motion features according to the needs of actual application scenarios, and according to the actual scene to optimize the Kalman filter algorithm to meet the robustness of the system to the nonlinear environment. After analysing the measured data, the algorithm can complete the task of tracking the target in real time, while maintaining higher accuracy and faster processing speed, and also has better robustness for different scenarios.