A ship target dynamic recognition method based on image recognition and VR technology

Effective identification of ship identity is an important means of water traffic management. Aiming at the technical limitations of AIS, radar and other means, a ship automatic recognition method based on machine learning is proposed, and a water traffic visualization system integrating multi-source recognition technology is realized based on VR technology. The test shows that the method and system can solve the technical problems of ship identification caused by data easy to be tampered with, difficult to distinguish massive targets and large obstacles, and provide an effective method for navigation support in key waters.


INTRODUCTION
With the rapid development of social economy and the deepening of water transportation, the number of ships is increasing, and the types are diversified. There are ship types such as oil tanker, bulk carrier, container ship and fishing boat, which makes the action coordination of ships difficult. At the same time, the ship is developing towards large-scale and high-speed, which not only increases the operation difficulty of the ship itself, but also puts forward higher requirements for the channel. It is necessary to provide a good traffic environment, special measures and services for its safe navigation. In addition, the increasing number of dangerous goods on board and the restriction of the controlled river section have led to the increase of navigation pressure of inland waterway, traffic congestion and hidden dangers of water traffic accidents, threatening shipping safety and river ecological environment, and restricting the normal operation of social economy [1]. Therefore, the intelligent tracking and monitoring of inland waterway, especially the navigation ships in the control river section, will help to improve the navigation command decision-making and improve the navigation status of the control river section.
Water traffic dynamic monitoring can find and coordinate maritime traffic targets in time. It is an important technical means to improve the efficiency and safety of ships entering and leaving the port, berthing and berthing, and navigation in the port channel. At present, VTS, AIS, radar, real-time monitoring video, GNSS positioning terminal, binding of mobile app and ship location, ship report, remote identification (RFID) are the main technical means of dynamic supervision of water traffic [2][3].
However, inland waterways face more technical obstacles, such as narrow waterways, many intersections, high ship density, many water navigation obstructions, large changes in channel water depth, complex navigation environment, long and wide navigation area, etc. Therefore, it is difficult to apply image recognition technology to inland ship tracking and monitoring. To sum up, the existing data fusion methods can not fuse the image with ship positioning, resulting in the decline of ship tracking accuracy at the port and poor navigation efficiency in the port channel.
This study uses modern information collection and AR monitoring technology to monitor ship traffic conditions, rectify ship traffic order and assist ship navigation. Through these measures, the establishment of good traffic order and the reduction of ship collision, grounding, reef and other maritime accidents are of great significance to improve the navigation capacity of inland waterways, reduce ship traffic accidents, reduce pollution and improve the information level. [4] At the same time, it helps to strengthen the law enforcement of waterways and improve the efficiency of waterway supervision. The research on port digital service technology can promote the application of land traffic tracking and monitoring and other video tracking and monitoring fields.

Ship target detection in HSV color space
The detection of ship targets through HSV color space is different from frame difference detection method. Frame difference method belongs to continuous multi frame dynamic target detection, while HSV color space detection belongs to single frame static target detection [5]. Because the color texture of water is obviously different from the ship target, the hue, saturation and lightness of water in HSV color space image are different from the ship target.
HSV space ship target detection principle: through research and analysis, most of the upper part of the hull of many ship targets are brightly colored, and the s component is obviously different from that of water, so the s component is an important means to detect the hull; However, the lower part of the hull is in contact with the water surface for a long time. Generally, the color is gray and the lightness value is relatively low, especially for small cargo ships. Therefore, there is a certain difference between the V component of the lower part of the hull and the water, so it is more effective to detect the lower part of the hull; Then combine s and V component detection to get a more complete target. In addition, due to the shadow of shore objects or ship hull affected by light, the effect of target detection will also be affected. Through the monotonous hue H component of shadow and fixed within a certain range, the influence of shadow can be removed by finding out the relevant range value through experiments.
HSV space ship target detection process: since the measurement scene is in the water area, the ship targets on the water shall be separated. In order to eliminate the influence of the shore, a detection area shall be set in the water area. Since the water area in the detection area is generally larger than the ship target area, the maximum value of the statistical histogram obtained for the s and V components in the detection area is the water background reference value t of the component T(s) And T(v), and then detect the s and V components at the virtual coil position of the current frame detection area, which is consistent with the reference values T(s) and T(v) If the absolute value of the difference exceeds a certain threshold, the pixel is judged as 1, otherwise it is 0; then the absolute value diagrams of the two components are calculated or fused together to judge that pixel 1 is the target and pixel 0 is the water background; finally, the shadow is removed through the h-component value of the virtual coil position. When the h-component value of the pixel with target 1 is within the shadow fixed value range, the pixel is reset to 0.  Figure 1. Basic technical process of ship target recognition method.

Virtual coil target capture
Several problems to be solved in ship target data acquisition and capture: First, capture when the ship target appears in the best position in the camera. Second, ensure that the capture of ship targets in the image is complete and the screen ratio meets the standard requirements. Third, the captured ship target shall be marked without repeated capture.

2.2.1.Setting of virtual detection coil and detection area: on the video image of the fixed gun, the virtual coil is set in the center of the water area, perpendicular to the navigation direction of the ship, and is
composed of three rows of coil blocks side by side. Each coil block is adjacent to 5 to 8 coil blocks. The detection area is set to facilitate the use of HSV space ship target detection method. The detection area is set in the water surface area where ships other than the shore often appear.

2.2.2.Virtual coil block detection
: each coil block, as an independent calculation unit, calculates whether the change of each pixel of the coil block exceeds the set threshold through the frame difference method or HSV color space detection method, and then counts the ratio of the number of pixels exceeding the threshold of the whole coil block to the total pixels of the coil block, If it is greater than 50%, it is judged that the target is detected by the virtual coil block and marked as 1. If it is less than or equal to 50%, it is marked as 0.  Figure 2. Ship target capture by CCTV.

Ship location and segmentation
The core task of ship target location and segmentation is to obtain the position information and contour information of all ship targets in a given image, so as to accurately separate the image of each ship target from the original image. The practical approach is to detect all ship targets in the image and frame the object with a rectangular frame. This is also a basic task in computer vision -target detection. This is a basic core module for us to build the whole system. Only by accurately framing all ship targets can we build an efficient ship identification and monitoring system. Target detection algorithms are roughly divided into traditional algorithms and algorithms based on deep learning.
Another category of target detection algorithms based on deep learning is non-region-based recommendation. The representative of this kind of algorithm is SSD algorithm [6]. The design idea of this kind of algorithm is end-to-end. The generation of candidate regions, feature extraction, location offset prediction and category prediction are encapsulated in the same network, which greatly improves the overall performance of the algorithm. SSD algorithm is one of the detection networks with the best real-time performance and the highest detection accuracy. Therefore, the construction of this system will be based on SSD algorithm.
In the training phase, the SSD algorithm framework is composed of input, feature extraction, default box generation, matching, prediction network and loss function, as shown in Figure 3 below. It is introduced as follows:  . Framework and flow diagram of SSD algorithm training stage SSD algorithm regards location prediction and category prediction as regression problems at the same time, so the loss function can combine location loss and category loss. Therefore, the form of loss function is multitasking. The loss function is an important driving force of neural network, because its optimization determines how the network learns the content. Therefore, SSD network adopts "smooth"_ L1 loss function is used as position loss function, and softmax loss function is used as confidence loss function, and then through a coefficient factor α Combine the two. The mathematical expression of the loss function in SSD algorithm is shown in formula (1).
Where n represents the number of matched candidate boxes. If n is equal to 0, l is considered_ loc=0， α It is a weight coefficient used to balance the losses of the two parts and avoid excessive proportion of the losses of one part. The expression of position loss function is shown in formulas (2) to (6).
Where, indicates the matching condition. When it is 1, it means that the ith default box matches the jth real box, and the real category is k, otherwise it is 0. The position loss function adopts the position regression function smooth L1 loss, where l represents the predicted coordinate offset, ̂ represents the coordinate offset corresponding to the real box on the matching of the current default box. di stands for the default box. The mathematical expression of the confidence loss function is as follows: Where, C represents confidence, and the meaning of other variables is the same as above.

3.1.Ship feature extraction
In this study, ORC (Oriented FAST and Rotated BRIEF) method is used to extract ship propagation features. ORB is a fast feature point extraction and description algorithm. The following are the steps of ORB feature extraction and corresponding feature matching [7].
ORB uses fast (features from accelerated segment test) algorithm to detect feature points. Taking the point to be detected as the center, the gray value is compared with the gray value of 16 adjacent pixel points. When compared with other pixels, if the gray difference is greater than a set threshold, it is considered that the point to be detected is different from the adjacent point. If 12 consecutive points are different from the point to be detected, the point to be detected is considered as a corner point, that is, a feature point.
In order to avoid taking too much time to calculate most non feature points, this method first calculates whether the gray values of 1, 5, 9 and 13 points are the same. Because if the point to be detected is a feature point, at least three of the four points are the same. Therefore, by comparing the above four points, the detection speed of feature points can be accelerated.
ORB uses BREF algorithm to calculate the descriptor of feature points. The core idea of BREF algorithm is to select n point pairs around feature point P with specific rules and combine the comparison results of these n point pairs as descriptors. Find the point pair around the feature points and obtain the 0 or 1 description of the point pair by comparing the gray value of the point pair. The same method is used for other point pairs to finally generate orb descriptors.
Specifically, the feature description area is framed with the feature point as the center and a certain length as the radius. Select 512 pairs of points and number them, P1 (a, b).. .PN (a, b).. . P512 (a, b). The descriptor of feature points is obtained by combining the calculated attribute results of multiple points. The characteristic descriptor of orb can be considered as a string of 0 and 1.
Since the descriptor of ORB feature points is a string of 0 and 1 binary strings, the matching of feature points can be transformed into string matching, which greatly saves storage and matching time. The matching of orb features is actually to calculate the Hamming distance between two feature vectors, that is, the number of different characters in the same position. It is defined as follows: ( 1 , 2 ) = ∑ ( 1 , 2 ) =1 (9)

3.2.Ship feature matching
Performing the above processing on each ship target feature point to obtain a feature point set of an effective ship target image. Using FLANN matching algorithm, the first feature point set kp1 and the second feature point set kp2 are extracted from any two effective ship target images respectively [8]; According to the first feature point set kp1 and the second feature point set kp2, the corresponding first feature description set des1 and the second feature description set des2 are obtained by using the SIFT feature extraction method or surf feature extraction method. Then using the feature matcher FLANN, the first feature description set des1 and the second feature set des2 are matched to obtain multiple sets of feature matching point pairs, and the Euclidean distance between each set of feature matching point pairs is calculated. = ( 1 , 2 , … ) = ( 1 , 2 , … ) = √( 1 − 1 ) 2 + ( 2 − 2 ) 2 + ⋯ + ( − ) 2 (10) Where, A and B are two different feature matching points in a set of feature matching point pairs, xn and yn is the coordinate of the feature point corresponding to a feature matching point and the coordinate of the feature point corresponding to the B feature matching point in the nth group of feature matching point pairs, and ab is the Euclidean distance.