Route Mining from Satellite-AIS Data Using Density-based Clustering Algorithm

Automatic Identification System (AIS) provides a large number of vessel navigation data for Marine traffic research. In this paper, we proposed a route mining method based on satellite AIS. The method includes data preprocessing, structure similarity calculation, clustering and route extraction. The validity of the route mining method was verified by satellite AIS data from seas near Australia and Warrior Strait. This method is helpful to understand Marine traffic pattern. It provides reliable basis for route planning and lay a foundation for ship abnormal behavior detection.


Introduction
As the most important mode of transport in international trade, maritime transport accounts for 90% of the global trade volume [1]. In order to cope with the increasingly complex maritime traffic environment and guarantee the maritime traffic safety, the automatic identification system came into being. AIS is a new navigational aid system, which is used for ship recognition, positioning and collision avoidance. It can automatically broadcast and receive ship-related information. In recent years, AIS is widely used in many aspects, such as maritime traffic characteristics found, ship behavior research and situational awareness [2].
Using clustering method in route mining is a research hotspot in AIS data analysis. Liu [3] added constraints to COG (course over ground) and SOG (speed over ground) and adopted an improved density clustering algorithm named DBSCANSD to realize stationary areas detection and route extracting. Wang [4] took port of call as clustering feature and adopted hierarchical clustering algorithm to extract shipping routes. Ma [5] used clustering and Dijkstra algorithm to obtain key nodes of ship navigation path and extract inland river routes according to the characteristics of inland navigation. Xiao et al. [6] simplified trajectory and divided it into sub-trajectories, then applied DBSCAN to obtain typical routes. Jin [7] segmented ship track according to speed and adopted OPTICS algorithm to obtain the results of the track clustering which highlighted the speed attribute. Song [8] applied C-OPTICS algorithm with course constraint to conduct route mining for AIS data from Jiaozhou Bay in Qingdao. And the research results were consistent with the real ship traffic pattern. Existing trajectory clustering studies mostly use inland rivers and shore-based AIS data. The ship type and movement pattern are relatively simple and the space span of data is limited. There is a lack of research on large-scale data analysis using satellite based AIS.
The TianTuo-1(TT-1) and TianTuo-3(TT-3, works normally in orbit till now [9]) micro-nano satellites developed by National University of Defense Technology have received a large amount of AIS data during the mission. We can obtain many valuable maritime traffic information by analyzing these data. This paper proposes a route mining method for satellite-AIS data based on density clustering algorithm. Firstly, satellite-AIS data need to be preprocessed and divided into some sub-trajectories. Secondly, DBSCAN algorithm is used for clustering according to the structure similarity distance. Finally, a three-dimensional sweep line method is used to realize route mining for satellite-AIS data.

Trajectory Clustering Method
The clustering method takes sub-trajectories as clustering objects. It including data preprocessing, structural similarity calculation, clustering and route extraction. In this paper, MMSI (unique identification), latitude and longitude, SOG and COG are used in data analysis. The trajectory of a ship whose MMSI is m can be written as

Data preprocessing
Raw AIS data should be pre-processed before analysis. Those data which are invalid, incorrect, duplicate, or inconsistent can be cleaned as shown in Table 1. Then, according to the conciseness and preciseness principle proposed by Lee [10], feature points are found to compress trajectories. Considering the conciseness principle, the location points with obvious changes of SOG and COG are selected as the feature points. And the changing rate of SOG and COG at Considering the principle of preciseness to reduce the trajectory shape loss, trajectories are segmented into some sections by feature points. Then D-P (Douglas-Peucker) algorithm is applied to each section and in each section one point will be added at most. The distance threshold  takes the smaller order of the mean value of longitude and latitude span of sub-trajectories. The sub-trajectory is defined as the track formed by a pair of time-adjacent feature points which belong to the same trajectory. Then the set of sub-trajectories of satellite-AIS data is as follow： , , , , C x x x x x  is a cluster obtained by DBSCAN. Red circles represent the  -neighborhood. 1 x is a core object and 2 x is directly density-reachable from 1 x . 3 x is density-reachable from 1 x as well as density-connected with 4 x .

Structural similarity calculation
The similarity measurement is the basis of density clustering. We used structural similarity to describe the similarity of sub-trajectories. Structural similarity is the weighted sum of spatial distance, velocity distance and directional distance. Spatial distance describes the spatial density of sub-trajectories distribution, which is define by Hausdorff distance. Velocity distance is used to measure the difference of average velocity between different sub-trajectories. Directional distance is used to measure the difference in the direction of the quantum trajectory.  4) and (5) Where h w , v w and w  are weights and 1

Route extracting
Previous studies mostly focused on shore-based AIS data and adopted two-dimensional sweep line method to extract representative routes [6], [11]. As shown in Fig.2, the basic idea of this method is to set a sweep line to scan along the cluster direction. When the number of intersection points of sweep line and sub-trajectories is greater than the threshold, the average coordinate should be calculated and set as a route node, and the representative routes are composed of all route nodes. In this paper, satellite-AIS data are used in route extracting. Considering that the span of sub-trajectories from satellite-AIS is relatively larger, so the sub-trajectories should be treated as arcs rather lines. The difference between three-dimensional sweep line method and the two-dimensional sweep line method is that the new method takes sub-trajectories as arcs and rotates the sweep line around the axis instead of translating it in the sub-trajectories clusters.   Fig.3 shows the realization of 3D sweep line method. The green arcs in Fig.3(a) is the sub-trajectories of cluster 1 2 i { , , } c c s s s   and the blue arc is the direction of cluster c whose normal vector c N can be calculated by equation (7). The yellow arc in Fig.3

Results and Discussion
The satellite-AIS data from seas near Australia and Warrior Strait is used to test the effectiveness of the trajectory clustering method.

Cluster analysis of ship tracks in seas near Australia
The seas near south Australia ( 30 0 -4   S, 115.8 0 -14   E) was studied. There is 4596 message of category 1,2 and 3 received by TT-3 from 18 th February 2018 to 26 th February 2018 (149 trajectories in total). Fig.4 shows the original data. Fig.5 and Fig.6 are the feature points and their trajectories. After data preprocessing with parameters in Table 2, there left 2221 position points and the data compression rate is 48.32%. Satellite-AIS data have the disadvantage that some sub-trajectories may be too long. These long trajectories cannot describe ship behavior as well as measure the space distance correctly. So we limited the span of sub-trajectories composed of feature points. In this example, the latitude and longitude span of sub-trajectories were limited within   0.3684 ,3   and   0.0541 ,3   . Fig.7 shows 319 sub-trajectories selected from Fig.6. Then we used these sub-trajectories for clustering analysis.   Fig. 8) is good. Region A and B in Fig.8 mark the abnormal sub-trajectories due to the heading. Region C is recognized as noise because it is far from other sub-trajectories. Comparing the route extracted from satellite-AIS in Fig. 9 with the real route, the route in this example is in coincide with a section of route from Melbourne port to Fremantle Port in AUS3. Fig. 9 Route extracted from satellite-AIS  Fig.10 shows the original data. Fig.11 and Fig.12 are the feature points and their trajectories. After data preprocessing with parameters in Table 3 Fig. 10 The original trajectory Fig. 11 Comparison of original data and feature points data   Fig.14) is good. Considering sailing direction of the two clusters in Fig.14, cluster 0 (in blue) is ships crossing the strait from north to south, and cluster 1 (in green) is the ships crossing the strait from south to north. The clustering result indicates that speed distance in structural similarity can well distinguish ships with different sail direction. Fig.15 shows the route extraction results and it enlarges the nearest region between the two routes, where there is an interval of not less than 1km. According to the traffic separation scheme, in ports, straits and other areas with heavy traffic, the channel should be divided into two navigation channels. And one way navigation is implemented for each channel. It can be seen from Fig.15 that when ships crossing Warrior Strait, most of them obeyed the traffic separation scheme. Specifically, they sailed to the right. Comparing the routes extracted from satellite-AIS in Fig.15 with the real route, the routes in this example are in coincide with a section of route from Brisbane Port via Warrior Strait to the southeast coast of China, Japan or Korea.

Conclusion
In this paper, we have proposed a route extraction method based on satellite-AIS. The method includes data preprocessing, structural similarity calculation, clustering and route extraction. The real data from seas near south Australia and Warrior Strait collected by TT 3 satellite is used to verify the effectiveness of this method. However, the route extraction method performs not good enough in distinguishing different routes in high-density area, this may be because the clustering algorithm classifies some subtrajectories wrongly. In order to solve this problem, the measurement method of structural similarity and the clustering algorithm need to be modified in the future work.