A Contextual Hybrid Model for Vessel Movement Prediction

Predicting the movement of the vessels can significantly improve the management of safety. While the movement can be a function of geographic contexts, the current systems and methods rarely incorporate contextual information into the analysis. This paper initially proposes a novel context-aware trajectories’ simplification method to embed the effects of geographic context which guarantees the logical consistency of the compressed trajectories, and further suggests a hybrid method that is built upon a curvilinear model and deep neural networks. The proposed method employs contextual information to check the logical consistency of the curvilinear method and then, constructs a Context-aware Long Short-Term Memory (CLSTM) network that can take into account contextual variables, such as the vessel types. The proposed method can enhance the prediction accuracy while maintaining the logical consistency, through a recursive feedback loop. The implementations of the proposed approach on the Automatic Identification System (AIS) dataset, from the eastern coast of the United States of America which was collected, from November to December 2017, demonstrates the effectiveness and better compression, i.e. 80% compression ratio while maintaining the logical consistency. The estimated compressed trajectories are 23% more similar to their original trajectories compared to currently used simplification methods. Furthermore, the overall accuracy of the implemented hybrid method is 15.68% higher than the ordinary Long Short-Term Memory (LSTM) network which is currently used by various maritime systems and applications, including collision avoidance, vessel route planning, and anomaly detection systems.

categorize them into three classes [3]: physical (mathematical) model-based methods, learning model-based methods, and hybrid methods. The physical models are based on equations of motion [17] and refer to the precise mathematical description of a vessels' dynamics and its neighboring environment [14]. These methods precisely consider all possible influencing factors (mass, force, inertia, yaw rate, etc.) and calculate motion characteristics using physical law [3]. The second class of methods models vessels' motion by one learning model that learns motion characteristics from historical motion data and thus implicitly integrates all possible influencing factors [3]. Among learning methods, deep learning can independently identify patterns in large amounts of complicated, unstructured data, and generate predictions through a neural network model [1]. Deep learning has the potentials to benefit autonomous processes in the shipping industry [1].
The hybrid methods build a model that either explicitly considers part of influencing factors and is trained by historical motion data, or combines different learning methods to form one model to have a better performance [3]. The hybrid methods are developed to combine the strengths of their constituent models to improve prediction accuracy [3].
The movement of any object is embedded in context [18]. In this study, the geographic context is referred to part of a situation or data that influences movement [19]. This context both enables and limits movement [18]. Variations in contextual parameters may also cause certain behavioral responses in moving objects [20]. While the movement of the vessels' can be significantly affected by the geographic context, the current systems and methods rarely incorporate contextual information into their analysis.
Within this framework, this paper's contributions is twofold: the first objective is to develop a new trajectory compression method to address the logical consistency of compressed trajectories. The second objective is to predict the vessels' location using a novel context-aware hybrid method tailored to AIS data. The proposed method is developed to combine the strengths of physical and deep learning models, and to improve vessels' movement prediction accuracy. This paper integrates the vessels' motion equations into the deep neural networks. The method can capture the vessels' information (e.g. vessel type) as well as geographic context (e.g. shoreline) to predict vessels' future positions. This method is expected to increase the accuracy of the vessels' trajectory prediction. In addition, the proposed method uses the geographic context in order to provide a trajectory compression method that can maintain the logical consistency of compressed trajectories while reducing the volume of the data. The filter is based on an algorithm that incorporates contextual information with topological relationships that are expected to maintain the logical consistency of compressed trajectories. In addition, by considering contextual effects in vessels' movement prediction a new perspective has been taken into account in this paper. In other words, instead of using context just as an input parameter to a prediction model, the context effect is modeled in the trajectories preprocessing method and it is expected to keep logical consistency.

A. ORGANIZATION
The rest of the paper is organized into six sections. In section 2, the related work in vessel trajectory prediction and deep neural networks are reviewed. The third section sets out the background concepts and data sources. Section 4 presents the proposed context-aware trajectory simplification algorithm and provides an overview of the proposed framework and method's architecture. Section 5 discusses the results and section 6 puts forward the conclusions and suggestions for future research.

II. RELATED WORKS
Physical-model for trajectories prediction attempts to explicitly include all influencing factors in the modeling equations [3]. Abdelaal [5] et al. took surge force and the yaw moment into account to build a predictive model with an application into the collision avoidance system. Regardless of the accuracy of their method, it seems to be impossible to provide such detailed information for all vessels and this may limit the use of the mentioned method. Vijverberg [21] et al. used a linear extrapolation model to predict the future location of the vessels. While this is a relatively simple model, it does not take the environmental factors into account, which may have a negative effect on prediction accuracy.
Methods that are based on physical laws precisely describe vessels' dynamics and their neighboring environment [14] and are capable of calculating motion characteristics precisely [3]. These methods can greatly outperform any other methods [22]. But they are rarely used independently for nonsimulated vessels' trajectory prediction because, to perform well, such models need an ideal environment and accurate state assumptions which are difficult to attain in reality [3]. However, their ability to describe a motion system can be useful when combined with machine learning methods [3].
Learning model-based trajectory prediction methods ([23]- [28]) are gaining more interest in recent years [29]. They rely on the learning models to mimic the movement behavior of vessels based on the historical movement dataset [10]. In this class, artificial neural networks are gaining more attention in predicting vessels' movement trajectories i.e., regarded as complex, dynamic, and technical systems [30]. Neural networks are systems that are a form of artificial intelligence that attempt to mimic the behavior of the human brain and nervous system [31]. They can learn and adapt to changing environmental conditions [30]. They are an interesting alternative to physical methods in the absence of a precise hydrodynamic model of the vessel and in the case of difficulties and time constraints relating to the identification of analytical model parameters [30].
Zorbas [23] et al. introduced a machine-learning model using artificial neural networks (ANN), which exploits geospatial historical patterns of vessels' movements in the form of time-series, to predict future trajectories in five-minute intervals. Millefiori [24] et al. continued the previous work of Pallotta [25] et al. to propose a novel method for long-term prediction based on the Ornstein-Uhlenbeck (OU) process. They predicted the future trajectories of the vessels' in large time intervals. This model outperforms the nearly constant velocity (NCV) model [24]. Mao [26] et al. trained an extreme-learning machine (ELM) to predict the future locations of the vessels based on AIS data. They considered several input datasets, including the maritime mobile service identity (MMSI), a time, latitude, longitude, a speed over ground (SOG), a course over ground (COG), and a rate of turn (ROT). Within two prediction slots, i.e. 20-minute and 40-minute, the error distribution seems to be less than 2.5 and 6 nautical miles, respectively.
Among the learning-based models, a recurrent neural network (RNN) has become state of the art for ''sequence to sequence learning'' problems. An LSTM network, which is the most popular version of RNN, has proven its versatility in prediction applications [28]. Gao [27] et al. constructed a bidirectional long short-term memory recurrent neural network (BI-LSTM-RNN) that uses AIS data. Their method can predict six future points with a 90-meter error based on three historical data points. Learning-based trajectory prediction methods combine all factors as a single package. So, their prediction quality depends strongly on both the quality of the data and the learning ability of the model [3].
Hybrid algorithms benefit from the advantages of their constituent algorithms [21], [32], [33]. For example, Perera and Guedes Soares [32] combined the curvilinear motion model with an Extended Kalman Filter (EKF) to predict the velocity and acceleration of the vessel. In order to consider the environmental factors into the prediction models, Vijverberg [21] et al. built upon the work of Vemula [34] et al. and added a feature vector for the Attention-LSTM. At first, they used a linear extrapolation to predict vessels' future positions in 10-second time intervals and then trained the LSTM model to calculate the linear displacement or the offset of the vessels from the predicted path. Zhou and Shi [33], combined the least squares support vector machine (LS-SVM) and particle swarm optimization (PSO) to build a new vessels' motion prediction method. Their method can predict vessels' motion in 15-second time intervals.
Hybrid model-based prediction methods can combine the advantages of its sub-models and thus are expected to perform better [32]. Therefore, this paper combines the curvilinear model [35] and the deep learning LSTM network [36] by taking contextual information as well as vessels' data, which is expected to retain logical consistency and increase prediction accuracy. Due to the fact that the performance of learning methods for trajectory prediction significantly depends on the quality of the data [3], and the quality of spatial data can have different aspects [50], this paper focuses on the logical consistency of compressed trajectories. This research introduces a new trajectory simplification method that incorporates contextual information with topological relationships while maintaining the logical consistency of data.

III. RELATED CONCEPTS AND DATA SOURCE
The term context has a wealth of different definitions in the field of computer science. To use context effectively, it is important to understand what context is and how it can be used [37]. In this paper, context is defined as part of a situation or data that influences movement [19].
According to the literature, many contextual factors like the sea state [38], wave direction [39], water depth and distance to coast [39], wave height [40], fluid speed [40], wind speed [39], [41], [42], wind direction [39], [43], the density of air [44], vessel type [39], [40], [43], vessels' hull design [40], and vessels' length [41], [44]- [47], affect the vessels' movement. This paper, according to the presented meaning of the context and data availability, uses a shoreline and vessels' type as contexts. According to Allianz, vessels' grounding is the second cause of vessels' losses and the second prominent causalities, in the last decade [2]. Grounding may refer to the aground of the vessels into ashore, reef, or wreck [4], and consequently, to avoid the mentioned vessels' losses and casualties, a water depth must be taken into account [48] as a water depth and accordingly shoreline directly influence vessels' movements.
The vessel type, including tanker, tug, cargo, defines the properties of the transport (e.g. maneuverability, speed, etc.) within the water medium [14], [39], [40], [43] and since the speed and agility of different types of vessels are different, their maneuverability will also differ [14]. According to the fact that vessel type influences its movement, considered context in this paper are vessel types and shorelines.
A trajectory is also defined as a sequence of points with two-dimensional coordinates. Referring to (1), a vessel trajectory is represented by T which is a sequence of points in 2-dimensional Cartesian space [49].
where (x i , y i ) is a position of the vessel at time t i .

A. PROBLEM FORMULATION
To take advantage of both physical and learning models, this study attempts to build a hybrid method for predicting vessels' trajectories. At first, it uses a curvilinear model which is capable of modeling different types of motions [35] as a physical method of vessels' trajectory prediction. This curvilinear model [3], takes the position of a vessel, (x t , y t ) at time t, velocity v, and course of the vessel at t-1 and t, to estimate the next three position of the vessel T = Although the accuracy of the curvilinear models may seem to be adequate, more precision may still be required for certain applications that need more ideal environment and accurate state assumptions [3]. Unlike currently existing approaches that add new parameters or rigid assumptions to the curvilinear model, we achieve higher accuracy through executing a deep neural network, which has only been made possible recently due to the release of large historical AIS datasets. Such datasets can be used for training learning-based trajectory prediction models.
Let the trajectory, to be estimated by the curvilinear model is represented with T and the actual trajectory with T . The k-th point on actual trajectory is denoted by p k = (x k , y k ) and its corresponding point on T is p k = (x k , y k ). The deviation between p k and p k can be defined based on (2).
where i and j are the standard basis in 2-dimensional Cartesian space in the x and y direction, respectively. For the known τ k , it will be possible to transform T into actual trajectory T . Therefore, this paper introduces a novel context-aware LSTM network (CLSTM) to model τ k at each point of T and then transforms T into T by which it is expected to improve prediction accuracy. This paper trains a modified LSTM based deep neural network to mimic the vessels' movement behavior by minimizing the τ k . The goal of the training phase is to minimize the differences between actual and predicted trajectories. Therefore, the model was trained to mimic the actual trajectories by minimizing the τ k for predicting the vessels' future trajectories with high accuracy. Detecting anomalous movement behaviors is out of the scope of this paper. The LSTM networks' performance significantly depends on the quality of the data [3]. Spatial data quality can have different aspects [50], including positional accuracy, temporal accuracy, semantic accuracy, incompleteness, and inconsistency. This paper just focuses on the logical consistency aspect of moving data in the trajectory compression phase. So, a new filter for trajectory preprocessing is built which aims to maintain the logical consistency of compressed trajectories.

B. DATA SOURCES AND PREPROCESSES
In order to model and predict the vessels' movement, two datasets of the USA shoreline and the vessels' AIS dataset are used. The USA shoreline dataset is released in ESRI shapefile format and comes from the National Oceanic and Atmospheric Administration (NOAA) [51], worth mentioning that it is used in the proposed simplification method.
The vessels' AIS dataset, from the eastern coast of the United States of America, is made accessible for public use [52]. The dataset contains AIS messages of 10676 vessels for about two months, i.e. November 2017 to the end of December 2017. The dataset includes 58,545,206 AIS recodes occupying 6.68GB. The AIS data of November included 31,043,610 and the dataset of December included 27,501,597 points. Vessels' state contains time, SOG, COG, and heading, whose resolutions are respectively 1 minute, 0.1 knots, 0.1 • degree, and 1 degree.
The AIS raw data often can have noisy data due to the nature of signal acquisition [53]. Anomaly detection in preprocessing refers to using methods to detect and clean anomalies in trajectory data [54], [55]. To correct the data outliers, the paper applies the moving average method [55] to the raw AIS data. Also, the wrong AIS messages, in which the length of their MMSI number is not nine-digits, are eliminated according to the length of the MMSI number. As the result of such syntax checking phase, i.e. MMSI length smaller or longer than nine digits to validate data entry, 62 MMSIs with 135,878 corresponding AIS messages are removed. Fig. 1, illustrates the AIS message recorded in December 2017 after the pre-processing phase. The AIS messages are grouped based on vessel types. Fig. 2, shows the number of different vessel types in our dataset. Then, AIS messages are converted into trajectories by a Python library named, MovingPandas [56]. There were only 20 trajectories in the military group, so this group is eliminated due to the limited number of AIS messages. Lastly, trajectories are generalized to reduce both the size and computational complexity of the process.

A. CONTEXTUAL TRAJECTORY COMPRESSION
An AIS message is transmitted by a vessel at frequent intervals of approximately 2s-6min [57]. The behavior of vessels VOLUME 9, 2021 is very regular because of the high rate at which the AIS messages are transmitted [58]. This regularity provides the opportunity to compress the vessels' trajectories to a much smaller volume without losing important information [58] and thus made it easy to store and analyze [59]. The trajectory compression refers to the elimination of points in a trajectory, which does not contain new information [60] to reduce the overall volume of data [59]. The trajectory compression accelerates [61], and improves the performance [59] of the further processes. The ultimate criterion for the quality of compression is in the extent to which the data can be compressed without damaging the use of the trajectory data for further processing [58]. Algorithms use different criteria to identify the splitting points [59] such as offset distance [62], time, velocity, or combinations of them, i.e., a combination of time and distance [59], or a combination of velocity and distance [58]. This paper employs three criteria for its trajectory simplification method: offset distance, logical consistency, and similarity. To maintain logical consistency, the topological signature of the original and simplified trajectory should be the same [63].
In practice, the vessels' trajectories cannot intersect with the coastline. If there are some intersections generated through the compression process, the generated segment is void. This can be checked by validating the topological relationship between the simplified trajectories and the coastlines. Therefore, to compare the topological signature of the original and simplified trajectories, the proposed method checks the intersection of the trajectories segments and the coastlines. During the compression, the original trajectory deforms into a new geometry with a smaller set of vertices; the logical consistency [64] may be violated if these geometries are found within the convex hull of the set of vertices forming the trajectory [65]. Consider Fig. 3, which illustrates an example of the problem, the black line is an actual trajectory and the red one is the compressed one. A compressed trajectory passes into the shoreline, which is not logically consistent. In addition, the method is expected to maintain a balance between the compression ratio of simplified trajectories and their similarity to the originals. To measure the similarity, the Dynamic Time Warping (DTW) method is used which warps trajectories non-linearly with varying sampling rates [66]. For trajectories T i and T j, this method minimizes the cumulative distance over potential paths between two vertices. Based on (3), it is possible to use different distance functions in the DTW [67].
where n is the length of T j and δ(w s ) is a distance function. This work uses the Euclidean distance as a distance function in DTW.
To build a new trajectory simplification method, this paper updates the two-stage Piecewise Linear Segmentation (2stage-pls) with the idea of Tienaah [63] et al. At first, it integrates contextual information to the PLS method, which seems to maintain logical consistency. The new method called a Context-aware Piecewise Linear Segmentation (CPLS) works in the following way. For a trajectory T with n vertices, the first and the last points, p 1 and p n are selected and the E µ is computed for all intermediate points, based on (4).
where the (x k , y k ) is a point in the trajectory. If the computed E µ is greater than the offset distance ∈ d , then it applies the procedure again, with the corresponding point p i . When there is no error greater than the given threshold, next for all other trajectory vertices p 2 , . . . , p (n−1) , it eliminates the first point p 2 . After that, it checks the consistency of the topological signature. If removing p 2 destroys the topological signature, it adds it to the trajectory, again. Otherwise, it eliminates p 2 and continues the algorithm with the next point.
The pseudo-code of the CPLS algorithm based on adding a topological signature is summarized in Algorithm 1.
Although the CPLS is expected to solve the problem of illogically compressed trajectories, using it may lead to problems related to retaining stops in trajectories.

Algorithm 1 Context-aware piecewise linear segmentation
Input: trajectory T, offset distance ∈ d 1 Build topological relations for trajectory T add vertex T j toT c 21 returnT c It may happen for a trajectory to be reduced in such a way that in the compressed trajectory it appears as if the vessel moves slowly, whereas in the original uncompressed trajectory the vessel stops moving for a period of time [58]. To deal with this problem the CPLS method is extended. Two heuristics are behind this extension. First, the stopping and moving points of vessels which are important in the behavior analysis, are more apparent in the derivative, i.e. the speed of the moving object, of the trajectory than in the trajectory itself [58]. Next, in some AIS messages, the status field has rich information about vessels' movement, which can be useful to identify their stopping, and moving behavior. Accordingly, the extension to the CPLS trajectory compression method, which is called the 2stage-CPLS, is proposed that can be seen in Algorithm 2.
According to Fig. 4, for a given trajectory T = [p 1 , p 2 , . . . ., p n ], first, the method starts to check the status field of each point. Supposed that the status field indicates the p i , p i+1 , . . . ., p i+k are stay points, i.e. the status value indicates that the vessel is at anchor, the algorithm stores these points in a temporary variable sp. Next, it computes the centroid of p i , p i+1 , . . . ., p i+k . Those points are eliminated and substituted by their centroid. This continues with the remaining vertices of trajectory p i+k+1 , p i+k+2 , . . . ., p n . Also, the CPLS is applied to the speed time-series of the trajectory using a one-dimensional error measure E v [58]. This error measure, for speed time series V of a trajectory T , is defined as follows: where v i is the speed of the vessel at point (x i , y i ) and v i is the speed of the vessel at point (x i , y i ). The point (x i , y i ) is located on the line segment < x 1 , y 1 >, < x n , y n >. Within the compression process conducted on the trajectory < x 1 , y 1 >, < x n , y n >, the compression algorithm, calculates the error which is caused by eliminating each point (x , y ) on the trajectory segment. Obviously, between two stay points, there must be some moving points [68], the point (x , y ) which can be a stay point or moving points based on the situation. The one-dimensional error measure, E v , is to compute the difference between the actual v i and the estimated peed v i . The pseudo-code of this method, i.e., the 2stage-CPLS method, is presented in Algorithm 2. The method uses the vessels' speed that is estimated at the trajectory simplification stage. AIS data can have varying frequencies of transmission, which means the time interval between the two records can vary. Using the vessels' speed, i.e. the ratio between distance and time interval, can help to adjust the traveled distance against the varying time intervals of AIS data and so we can handle the potentially varying AIS data transmission rate.

B. CONTEXTUAL MOVEMENT PREDICTION
This paper predicts vessel movement using a hybrid method, which makes it possible to combine the advantages of both physical and learning-based methods into a single prediction method. This follows the framework that is illustrated in Fig. 5. Append sp at the end of T i 8 Append for all vertices i and j in T v 14 T

C. CURVILINEAR MODEL
The curvilinear model is capable of modeling a variety of motion patterns including linear, circular, and parabolic motions [35]. Equation (7) describes the standard discretetime state system [35].
where x is an input vector of vessels' positions and F and G k (T ) are two transition matrices that describe the nonlinear motion properties of the system [3]. Fig. 6, shows the elements of the curvilinear model. Evaluation of matrix G k (T ) in (7), involves nonlinear matrix integral which is hard to be computed precisely [3]. In this regard it is sensible to use the assumptions that Zhang [3] et al. made to simplify (7): 1) Most white-noise excitation w k can be caught by the time-varying discretetime forcing matrix G k (T ), and 2) Vessels can only move in a horizontal plane, i.e., no vertical movement is possible.
Having made these assumptions, the curvilinear models can be constructed with the standard curvilinear-motion models from kinematics [3]. Also, de Vries [58] et al. show that high-frequency transmission of AIS messages makes it possible to assume the linearity of the vessels' movement between two AIS messages. Therefore, both normal and tangential accelerations can be set to zero [58]. This also means that we can replace the curves with the linear segments and approximate the motion model of the vessels between two AIS messages as follows: where v i and i are the velocity and course of vessels at t i , respectively. dt i is a timestamp between two AIS messages. L i is a distance that the vessel moves during two AIS messages. (x t , y t ) is the actual coordinate, and (x t , y t ) is the estimated coordinates of the vessel at t i . Based on the vessel's current position (x t , y t ), the current velocity v t , the current course t , velocity prior to current time v t−1 , and the course t−1 prior to current time t, the method can predict positions of the vessel up to three future epoch. These three predicted positions can form a new trajectory T = [(x t+1 , y t+1 ), (x t+2 , y t+2 ), (x t+3 , y t+3 )]. As it may not be possible to have all ideal environmental factors and states affecting the vessels' movements, the predicted trajectory T , may not exactly fit the actual trajectory T . To match the predicted trajectory T and the actual trajectory T , the deviation τ k , between estimated and actual coordinates should be computed. This paper introduces a method for computing τ k based on contextual data that considers the spatial and temporal components of movement to estimate the coordinates that have a minimum difference from the actual coordinates. The method which is called the context-aware LSTM (CLSTM) network, is to improve the prediction accuracy of movement prediction.

D. CONTEXTUAL CONDITION
As explained in C, trajectory T that is estimated using a curvilinear model has to maintain the logical consistency, e.g., it must not cross land or crosses untraversable corridors obviously the vessels' trajectories cannot cross the shorelines or islands. As shown in Fig. 7, in areas that are too close to shorelines the risk for T to intersect with the shoreline can be higher. It is possible to avoid such problems by adjusting timestamp dt to monitor the movement at higher temporal accuracy, see (7). The timestamp is reduced to t α is expressed by (12).
where n is the number of vertices of the trajectory and dt i is a timestamp between two sequent trajectory vertices.

E. CLSTM STRUCTURE
The vessel types, e.g. tanker, tug, cargo, can directly influence the maneuverability [14], which can have a high impact on the movement patterns. Therefore, vessel type seems to be a good classifier that is neither spatial nor temporal that can help minimize the risk of having Simpson Paradox, i.e. global learning model for the trajectory prediction of all vessel types. This paper provides a CLSTM model that considers the context, for predicting the movements of different but specific types of vessels. The LSTM networks can solve the long-term dependency problems. An LSTM unit consists of a cell, an input gate, an output gate and a forget gate. Fig. 8, illustrates an internal schematics of an LSTM cell unit.
The illustrated structure in Fig. 8, allows for recursive feedback loops which can update the weights in real-time. This can prevent gradient disappearance and gradient expansion. The first step in an LSTM network training is the state initialization, in which the number of neural nodes of the input layer, the number of output nodes, and the number of cell units are determined. Here, both the initial state of the cell and the link weight of each layer is set to be zero. Equation (13) calculates the output.
where w ij is the link weigh, x i is an input unit and θ j is the offset. The next step is to calculate the amount of information going to be thrown away from the cell state by the sigmoid layer, which is called the ''forget gate layer''. Equation (14) generates a number between zero and one indicating how much information is going to be kept. One means that all information is kept while zero indicates the elimination of all information.
where f t is the forget gate, σ is a sigmoid and W is the weight matrix. The next step is to deal with information that is going to be stored in the cell state. At first, according to (14), an input gate layer (i t ) which is a sigmoid layer, decides which values are going to be updated. Then, the hyperbolic tangent (tanh) layer creates a vector for the new candidate values,C t , which can be added to the state. This is expressed by (15) and (16), respectively.
The old state values (C t−1 ) and the new candidate values (C t ), are respectively multiplied by the forget gate (f t ) and the input gate (i t ). In (17), the summation of these two creates new cell values.
To finish the training process, the model runs a sigmoid layer to examine which parts of the cell state should be updated. These values are passed to the state value (h t ) of the next unit as described in (18) and (19).
Equation (19) passes the cell state to tanh which pushes values into a range of −1 to 1. Multiplying such values by the output of the sigmoid layer allows updating those parts that are important and effective. The square root of the difference VOLUME 9, 2021 of output value (O) and expected output (y) is used as a measure of each batch training error, based on (20).
For the learning process, a position is considered as a variable. The proposed method based on the CLSTM network can take three vertices of T as its input and transform them into the corresponding vertices of T. An input layer can be expressed based on (21).
The residuals, i.e Root Mean Square Error (RSME), can be calculated as follows: To measure the contributions of integrating contextual data to the prediction accuracy, the point-wise error [28] for both models can be calculated. The point-wise error is the distance between every pair of vertices, i.e. predicted vertex on 2D Cartesian and the actual coordinates.

A. TRAJECTORY COMPRESSION
In order to understand the effectiveness of the 2stage-CPLS compression method, 500 randomly selected trajectories are considered to test. These trajectories are compressed using the 2stage-PLS and the 2satge-CPLS and results are compared for better evaluation. In addition, for better evaluation, both methods are used in different settings to test different scenarios. In this regard, the offset distance ∈ d ranges from 0.02 to 2 km with increments of 0.02 allowing different parameters and scenarios to be tested. The speed offset ∈ v is set to be 1.54 m/s (3 knots), ie. Having different settings, 5000 compressed trajectories were created by each method. In order to evaluate and compare the performance and effectiveness of the 2stage-CPLS and the 2stage-PLS trajectory compression methods, two performance measures are used. They include the compression ratio R and the similarity of the compressed and the original trajectories. The R is defined by (26), and the similarity is measured based on (2).
where N is the number of trajectory vertices after compression and M is before compression. Also, to evaluate the performance of the methods based on how much of the logical consistency of compressed trajectories are preserved. To do so, the topological relationships between the output sets, i.e. trajectories, and the geographic data that is used as contextual data are considered after compression. If compressed trajectories cannot maintain any of these relationships, e.g., crossing the shoreline, they are assumed to be logically inconsistent. All compressed trajectories are checked and compressed ones using the 2stage-CPLS do not have any intersection with shorelines which means that the 2stage-CPLS is able to maintain logical consistency. In contrast, most of the trajectories that were compressed with the 2stage-PLS seem to fail the logical consistency test for some of the cases.
According to the experiments, the compression ratio of the 2stage-CPLS and the 2stage-PLS have different behavior based on compression ratio. As illustrated in Fig. 9, while the compression ratio of the 2stage-PLS increases by the increment of offset distance, the compression ratio of the 2stage-CPLS decreases gradually. For different offset distances, the compression ratios of the 2satge-PLS seem to sustainably higher than the 2satge-CPLS's. The important vertices are the vertices of trajectories that must be kept in compression procedure in order to maintain the logical consistency or the spatial accuracy. Equation (26) defines the percentage of important vertices dR.
The dR shows the percentage of vertices that their removal can fail the logical consistency of the compressed trajectories. As shown in Fig. 9, the 2stage-CPLS's R seems to decrease as the offset distance increases. This means that the number of important vertices dR×100 can have a negative but direct correlation with the offset distance. During the compression process, the vertices of the trajectories which do not contain new information (due to the high transmission rate of the AIS messages) are removed, which may change the trajectory's geometry [61]. The compression with a large offset distance can potentially have a larger geometrical change or deformation. This is illustrated in Fig. 10, which shows a trajectory compressed with the 2stage-PLS.
In Fig. 10, the trajectory, which is compressed with a 200-meter offset (red line), has the smallest deformation. In addition, the topological signature of this trajectory has a slight difference from the raw trajectory (white line in Fig. 10). This trajectory in one place intersects with land (see the cyan array). The number of such intersections that destruct the logical consistency of compressed trajectories increases as the offset distance grows. To maintain logical consistency, those intersections have to be avoided. Therefore, the 2stage-CPLS method compares the topological relations of trajectories and contextual data to avoid such intersections by retaining important vertices. The large deformation means that the 2stage-CPLS has to retain more vertices. As a result, the R of the 2stage-CPLS decreases as offset distance increases. Learning-based methods for movement prediction, tend to mimic movement behaviors of vessels based on their historical movement dataset [10]. Therefore one of the expectations after the compression process is such that the dataset keeps the representativeness of the variety of possible behavior. The compressed method should be able to compress trajectories while keeping logical consistency, the best replacements of initial trajectories, with maximum similarity to the initial trajectories. Therefore, for another evaluation measure, the similarities of compressed and raw trajectories for both methods are compared. This is illustrated in Fig. 10. The 2stage-CPLS presented in this paper restores the removed vertices to resolve topological conflicts.
As shown in Fig. 11, the similarity of trajectories compressed with the 2stage-PLS decreases as the offset distance increases. Based on experiments, the similarity measure for the 2stage-CPLS seems to converge. This makes it possible to select an optimum offset distance for compression. At optimum offset distance, a compressed dataset seems to be the best replacement of the original dataset while it has less volume.

B. MOVEMENT PREDICTION
To implement the proposed hybrid context-aware method, the AIS dataset is preprocessed and compressed with the 2stage-CPLS. The compressed dataset contains 52628 trajectories which 80% of which are used for training and the remaining for the validation process of the hybrid method (2satge-CPLS + CLSTM).
In order to evaluate the prediction performance, two performance evaluation methods are used. They include RMSE and point-wise horizontal error. Moreover, the results of the proposed hybrid method are compared with the ordinary LSTM network (2satge-PLS + LSTM). In both methods, 80% of the dataset is used for the training process and 20% for their evaluation.
For more detailed evaluation, the convergence effects of the proposed hybrid context-aware method (2satge-CPLS + CLSTM) and the ordinary LSTM network (2satge-PLS + LSTM) are compared. Both Fig. 12, and Fig. 13, illustrate the convergence effects of these methods. The AIS dataset that is used by this paper includes seven different types of vessels. They include tanker, tug, cargo, sailing, passenger, other, and fishing vessels. The ''other'' group type, refers to the group of vessels whose vessel type field is recorded as null [69]. Based on the architecture of the CLSTM network, it contains a unique LSTM deep learning neural network for each vessel type group. Therefore, used AIS dataset, the CLSTM network of this paper, is composed of seven LSTM networks.
It can be clearly seen from Fig. 12, and Fig. 13, that the deep neural network predicts the movement behavior of the vessel. After predicting for a period, because the data of a single vessel in the verification dataset are limited, it was replaced with the data of the other vessel to continue the prediction and validation process. This change suddenly deteriorates the prediction accuracy, but the prediction seems to be able to maintain performance and stabilizes in a short period of time. Such fluctuations of the validation accuracy, in the hybrid method (2satge-CPLS + CLSTM), are less than the ordinary LSTM network (2satge-PLS + LSTM). So, the convergence behavior of the hybrid method (2satge-CPLS + CLSTM) seems to be more predictable than the ordinary LSTM network (2satge-PLS + LSTM).
The level of predictability of the convergence behavior can be the result of integrating the contextual information (e.g. the vessel type). The movement behavior of the vessels seems to be a function of the vessels' type [14], [39], [40], [43], using a 'personalized' learning model for each vessel type, can improve the models' learning process, as the model can simply focus on learning one type of behavior with fewer variations.
Based on the convergence velocity, oscillation amplitude, and prediction accuracy, the proposed hybrid context-aware method (2satge-CPLS + CLSTM) seems to be superior to the ordinary LSTM (2satge-PLS + LSTM). In order to comprehensively analyze the proposed method, the average point-wise error is calculated which is summarized in Table 1. The outputs of both methods are compared to the actual trajectories. The results of vessels' movement prediction with the 2satge-CPLS + CLSTM method are 15.68% more accurate than the ordinary LSTM network (2satge-PLS + LSTM), as demonstrated in Fig. 14, the actual trajectory, and the predicted trajectory with the hybrid context-aware method.

VI. CONCLUSION
Predicting vessels' movement trajectory has an important role in the management of safety at sea. While the movement can be a function of geographic contexts, the current systems and methods rarely incorporate contextual information into the analysis. This paper introduces a novel approach for integrating the contextual data in the movement prediction process, to improve the accuracy of the vessels' trajectories prediction. The trajectory prediction based on the proposed method can provide security for navigation, assist in trajectory planning, and risk monitoring.
This study provides a solution for both compression and movement trajectory prediction of the vessels. The paper improves the classic compression algorithm, by combining the contextual information in different stages of trajectory mining and prediction. The novel prediction method builds upon the strengths of both the curvilinear model and the deep learning LSTM network. The proposed hybrid method embeds contextual information as well as vessels' historical AIS data to retain logical consistency and spatial accuracy for the predicted trajectories while increases prediction accuracy. Also, in order to avoid the Simpson paradox, for each vessel type, a unique LSTM network is trained to allow more accurate movement behavior predictions. Then, all LSTM units are combined into a single CLSTM structure.
Moreover, this study introduces a way to calculate optimal offset distance by optimizing the similarity measures and compression ratio. The optimal threshold selection is to achieve the best trade-off between the similarity and the compression ratio for a dataset. In the optimal offset distance, the compressed trajectories are a very good replacement of the original dataset, while their volume of the dataset is also minimized.
The experimental results showed that the 2stage-CPLS can potentially provide a higher trajectory similarity rate, improve the compression efficiency, and retain logical consistency. Compared to the 2stage-PLS algorithm, the proposed algorithm's compression ratio seems to be lower than that of the classic algorithm, the similarity quality of the proposed algorithm is better. In the aspect of logical consistency, the 2stage-PLS algorithm may not seem to be as good as the proposed algorithm, and the geometrical changes of the trajectories seem to be higher.
Based on the performed experiments, the proposed context-aware trajectory prediction method outperforms the ordinary LSTM network when it comes to the convergence velocity, oscillation amplitude, and prediction accuracy and it is 15.68% more accurate than the ordinary LSTM network.
As future work, it is recommended to explore the effects of the environmental and sea factors (a wave height, a wind speed, etc.) on vessels' movement prediction. Also, a related opportunity for future work can be on fleet movements, i.e. considering other objects and their topological relationships to achieve a better results.