A Road Network Enhanced Gate Recurrent Unit Model for Gather Prediction in Smart Cities

Gather prediction is an indispensable part of smart city projects. The city government can respond in advance based on gather predictions and greatly reduce the loss and risks caused by vicious gatherings. Compared with other trajectory prediction tasks (i.e., the recommendation of point of interest), gather prediction pay more attention to real-time trajectory data and requests stronger spatial-temporal dependence. At the same time, gather prediction is more focused on scenes with multiple types of trajectories. And the existing methods majorly rely on the trajectory data and ignore the great in ﬂ uence of geographical environment (i.e., road network structure). Therefore, this paper transforms the gather prediction into the trajectory prediction task with strong real-time condition in a certain city and conducts the gathering situations by predicting users ’ aggregated movements in next minutes or hours. A novel Spatiotemporal Gate Recurrent Unit (STGRU) model is proposed, where spatiotemporal gates and road network gate are introduced to capture the spatiotemporal relationships between trajectories. Compared with existing methods, we improve the performance of the model by adding road network structure and external knowledges, as well as time and distance gates to reduce model parameters. The proposed STGRU is evaluated on three real-world trajectory datasets, and the experimental results demonstrate the e ﬀ ectiveness of the proposed model.


Introduction
In recent years, various problems caused by the gathering of people are one of the main reasons hindering urban construction and development. Crowd gatherings are prone to various accidents, such as stampede, fighting, and wounding, which place high demands on the city's management and control capabilities. Therefore, gather prediction can greatly help the city government react in advance and significantly reduce the losses and risks caused by vicious gatherings [1,2]. As an indispensable part of smart transportation, gather prediction mainly leverages the trajectory data collected by various Internet of Things sensing devices [3][4][5][6] (such as mobile phones [7], cars, and other GPS devices [8]). These trajectory data include multiple types of patterns such as walking, driving, and public transportation. Firstly, the city government needs to predict the gathering situation to take preventive measures in advance and combine the geographical features such as rivers and buildings to divide the regions. Then, the city government may process all the trajectory data at the current moment in the city by predicting its location at a certain time in the future. Finally, the gather is obtained through statistical methods or clustering algorithms on trajectory prediction results.
In order to achieve high-accuracy trajectory prediction, current methods majorly focus on modeling the sequential of the trajectory data and the time interval and distance interval between adjacent trajectory points. The main object is to integrate temporal and spatial features to model user behavior patterns. Typical techniques like recurrent neural networks (RNN) [9], Long short-term memory (LSTM) [10], and Gate Recurrent Unit (GRU) [11] have been successfully applied to various types of sequential data modeling and have greatly improved performance. However, none of the above methods consider time intervals and geographic informations [12] in the trajectory data. Some recent works are devoted to extending RNN and LSTM to enable modeling of time and distance intervals between neighbor points. For example, ST-RNN [13] tries to model spatiotemporal context by extending RNN, and HST-LSTM [14] merged the spatiotemporal influence into LSTM. Recently, STGN [15] achieves SOTA by designed two pairs of time and distance gates to model the time interval and distance interval separately.
Nevertheless, trajectory data applied for gather prediction usually suffers the data sparsity [16], due to the uneven sampling interval and distribution of sensing devices. Previous efforts tried to apply spatial-temporal relations to mitigate the problem of data sparsity, but the effect is not obvious. Inspired by Li et al. [17], geographic environment information (such as road network structure) and external knowledge (such as weather information and holiday information) can effectively alleviate the problem of data sparsity. The impact of the geographic environment is essential for the modeling of short-term and long-term behavior patterns of users, and weather and holiday information will affect the overall behavior of users. For example, if the user's continuous trajectories are on the same road segment, it can be judged that the current behavior patterns are similar. Meanwhile, the long-term historical trajectory road network information can well assist in modeling the long-term behavior pattern of users. Furthermore, on weekends, more people are willing to visit more distant areas and stay in a certain location for a long time. All these side information can benefit mitigate the problem of data sparsity and improve the performance of gather prediction.
In order to make full use of external knowledge, this paper proposes a new spatiotemporal gated network by integrating road network structure and external knowledge, named Spatiotemporal Gate Recurrent Unit (STGRU). One pair of time gate and distance gate is designed to capture the short-term behavior pattern by utilize time and distance intervals, and a road network gate is introduced to memorize road network structures to model geographical environment constraints.
The proposed model abstracts the road network structure of the city into a planar graph and extracts the road network structure of a certain track point. And the weather and holiday information are integrated into the track information for input. Moreover, STGRU can model the long-term and short-term behavior patterns of users and reduce the scale of model parameters to a certain extent. Finally, the proposed model processes the trajectory prediction results with statistic methods to achieve gather prediction. Experiments show that considering the road network structure and external knowledge can effectively improve the performance of the model.
Our contributions are summarized below: ( Traffic information analysis [18] includes traffic flow forecasting and traffic demand forecasting. Traffic flow forecasting [19,20] and traffic congestion forecasting can help better regulate and control traffic and can effectively alleviate traffic congestion. The taxi demand forecasting method proposed by Geng et al. [21] can help taxi companies to better allocate vehicles. Li et al. [22] proposed a method for forecasting the demand for shared bicycles, which can optimize resource scheduling.
Mining and analysis of trajectory data can assist in traffic planning decisions. Wei et al. [23] used the number of stops and the parking position to analyze the effectiveness of the main line coordination.

Spatiotemporal Data Modeling.
On the other hand, trajectory data is a type of spatiotemporal data, with two dimensions of time and space. Data mining of spatiotemporal data is very difficult, and it is also one of the current research hotspots. The Markov chain-based model [24] is a classic sequence model. And deep learning methods [25] such as RNN, LSTM, and GRU have excellent results in time modeling. The method based on matrix factorization [26] or tensor factorization [27] can model spatial features. CNN [28,29] and GCN [30] are currently the best spatial modeling methods.
In order to capture the spatiotemporal features, Al-Molegi et al. (2016) proposed STF-RNN [31] to learn different temporal and spatial features. The TGCN [32] proposed by Zhao et al. uses GRU and GCN stacking to model spatiotemporal features. The STGCN [33] innovatively used CNN to model temporal features and achieved good results.

Gather Prediction.
Gather prediction is an indispensable part of smart transportation, which includes many application scenarios, such as hotspot area analysis, passenger flow prediction, and population transfer prediction. Tomaharu et al. [34] proposed a collective graphical model to predict the transition populations between areas. Verma et al. [35] use trajectory data to mine hotspots and realized largegranularity gather prediction. Ni et al. [36] through passenger flow forecasting realized the gather prediction between cities. Kumar et al. [37] used trajectory clustering and similarity analysis for gather prediction. Gather prediction also can be transformed into a multitrajectory prediction task 2 Wireless Communications and Mobile Computing under the same time and space, which is focus on the time and space characteristics between multiple trajectories.

Trajectory Prediction.
Different with other prediction tasks, the main features of trajectory prediction are geographic information and time information. Trajectory prediction can also use position semantics, speed, and direction. Based on the traditional probability, matrix factorization decomposes the matrix with a low-rank matrix to obtain the implicit feature vector of the user and the trajectory. Tensor factorization expands to three dimensions, including user, time information, and spatial information. Kurashima et al.'s [38] sampling is based on the subject and the distance between the user and the historical location. Liu et al. [39] combined location semantics to embed geographic context information. Research has shown that the sequence between consecutive trajectory points plays a vital role in trajectory prediction, and it is more significant in strong real-time trajectory data, because human behavior patterns are sequential. For prediction based on sequential data, the Markov chain model [40] is the most classic. Cheng et al. proposed a tensor-based model, named FPMC-LR [41], by fusing first-order Markov chains and distance constraints. Feng et al. proposed a personalized ranking metric embedding method (PRME) [42], which embeds the state at all times uniformly, and calculate the Euclidean distance between vectors to measure the similarity. Neural networks are widely used in various tasks because they can learn to model various nonlinear features. The ST-RNN proposed by Liu et al. (2016) is the first method to introduce a deep neural network into trajectory prediction, ST-RNN uses spatiotemporal information to expand RNN, and its effect is improved. STF-RNN replaced the transition matrix with the internal representation of automatically extracted spatiotemporal features, which can more effectively discover useful features related to model human behavior. Zhu et al. [43] considered modeling time intervals to improve performance and equipped LSTM with time gating. Yang et al. [44] used neural network models to model social network structure and user trajectory behavior patterns. HST-LSTM introduces spatiotemporal factors into the gates existing in LSTM to model spatiotemporal features.
A recently proposed STGN considers the spatiotemporal context. Our proposed STGRU has the following differences from STGN. First, STGRU is extended based on GRU, which reduces the amount of parameters and is more suitable for real-time trajectories. STGN adds time and space gates to LSTM, and the amount of parameters is more than twice that of LSTM. Secondly, STGRU is equipped with external knowledge gate to extract the road network structure to enhance the spatiotemporal characteristics and the influence of external knowledge on the overall movement pattern. However, STGN is only based on the trajectory of a single user, and it is difficult to capture the spatiotemporal relationship between users.

Method
In this section, we firstly give the definitions of gather prediction and introduce preliminaries for GRU. Then, we propose Spatio-Temporal Gated Recurrent Unit (STGRU), which uses time and distance intervals and road network structure to model short-term and long-term behavior patterns of users.
3.1. Overview. As shown in Figure 1, we perform trajectory prediction by stacking a STGRU layer and a softmax layer, then compare the result of trajectory prediction with the threshold η to obtain the result of gather prediction.
In our proposed Spatio-Temporal Gated Recurrent Unit, three gates are designed to extract spatiotemporal features and model user behavior patterns. The time gate and the distance gate can learn the time interval and distance interval in the trajectory, obtaining users' short-term behavior patterns. The road network gate aims to capture road network structural features which have the impact on short-term and long-term behavioral patterns.
In this paper, we only discuss the meshing method of dividing area, because meshing has the highest applicability. The STGRU model is also applicable to other dividing area methods, and has been echoed in comparative experiments.

Problem Formulation.
Let U = fu 1 , u 2 , ⋯, u M g be the set of M users. And according to the side length a, divide the city into a number of grids and number them, where each grid area is a 2 . Each grid corresponds to a unique area ID r. For user u, she has a sequence of historical regions visited up to time t i−1 represented as H u i = r u t 1 , r u t 2 , ⋯, r u t i−1 , where r u t i means the region user u visits at time t i .
The goal of gather prediction is to predict the regions where all users are located at time t i . Specifically, the higher H t ŷ Softmax layer Figure 1: Gather prediction network model based on STGRU.
3 Wireless Communications and Mobile Computing the prediction score s u r,t i of user u for the region r at time t i , means the higher probability that the user u would like to located in the region r at time t i .
According to the prediction scores of all users, predictions of M × k possible regions can be obtained. The number of people in the region r can be obtained by counting the prediction results. And the gather in the region r can be judged whether there the number is through the threshold η: where r state is the gather state of region r, 1 means there is clustering in the region r, 0 is not, and number r,t i is the number of people in the region r at time t i . The structure of GRU is simpler than that of the LSTM network, while the effect is very good. In order to reduce the amount of parameters to be more in line with real-time data types, our method uses standard GRU as show in Figure 2. Based on the standard LSTM network, GRU combines the forget gate and input gate of LSTM into a single update gate, removes the cell state, and uses the hidden state to transfer information. The basic update formula of GRU is as follows: Assuming that the number of hidden units is h, the batch input X t ∈ ℝ n×d at given time step is t, and the hidden state of the previous time step is H t−1 . R t , Z t ∈ ℝ n×h represents the reset gate and update gate, where σð·Þ is the logistic sigmoid function.H t ∈ ℝ n×h represents the candidate hidden state at time step t, where tanh ð·Þ is the double tangent function.
is corresponding biases. And e represents for the element-wise (Hadamard) product.
The reset gate R t controls how the hidden state of the previous time step flows into the candidate hidden state of the current time step and captures long-term dependencies in the time series. The update gate Z t can control how the hidden state should be updated by the candidate hidden state containing the current time step information and capture short-term dependencies in the time series. Figure 3, STGRU have added time gate, distance gate, and road network gate, which are denoted as T t , D t , and Road t , respectively. T t and D t are used to model the influence of time interval and distance interval on trajectory prediction, and Road t is used to capture the influence of road network structure on behavior patterns. Based on GRU, time gate, distance gate, and road network gate equations are as follows:

Components. As shown in two dotted red rectangles in
Combining T t , D t , and Road t , the calculation equation for reset gate and update gate is added as then modify Eqs. (4) and (5) tõ where Δt t is the time interval and Δd t is the distance interval. r t represents road network information. T t is equivalent to the time interval input information filter, D t is equivalent to the distance interval input information filter, and Road t is  The candidate hidden stateH t is determined by input information, reset gate, and the hidden state of the previous time step. The second reset gate stateR t is designed to memory the user's long-term road network access information. Road t is used to memorize the road network information r t , then transferred toR t , further toH t , and help simulate the long-term behavior pattern of users.
The update gate Z t can capture short-term dependencies. Therefore, a time gate and a distance gate are designed, combined with the above road network gate to control update gate state. T t memorizes the Δt t between the track points, referring to LSTM, and uses element-wise (Hadamard) product to incorporate it into the second update gate stateZ t . Similarly, D t memorizes the Δd t between the track points, and Road t memorizes the road network information and integrates it intoZ t . Modeling the distance interval can help capture the user's spatial behavior patterns, and modeling the time interval can help capture the user's behavior patterns such as speed and state. Modeling the road network structure can help capture users' short-term behavior constraints and long-term goals, as well as capture the spatial relationships between users.
The method of adapting the model for gather prediction is as follows. First, calculate the time interval and distance interval between track points, and H u can be converted to Secondly, add weather, holiday information, and road network information and further transform it into r u n , weather n , date n , t u where r u t contains longitude, latitude, and located region; weather t contains the highest temperature, lowest temperature, and average temperature; and date t is marked with 0 or 1 according to whether the date is a holiday. Then, X t in STGRU is equivalent to ðr u t , weather t , date t Þ, Δt t is equivalent to t u t − t u t−1 , and Δd t is equivalent to dðl t−1 , l t Þ, where dð·, · Þ is the function that computes the distance between two track points. road t is a vector, which is concatenated by the node where the trajectory point is located in the road network graph and the neighbor nodes. For example, the road segment where the trajectory point i is located represents nod e i , and the c neighbor nodes of node i are, respectively, represented as node i+1 , node i+2 , ⋯, node i+c . Then, road t can be expressed as road t = ðnode i , node i+1 , ⋯, node i+c Þ. In addition, in order to extract the behavior pattern of the group, we performed a unified modeling for all users and deleted the user ID. This paper uses a single-layer network model for comparison experiments, adds softmax layer for output, and uses the loss function of categorical cross entropy: Finally, the forecast results are counted in the same time and space. When the statistical value of a certain region r exceeds the set threshold η, it is considered that this region will gather.

Analysis and
The number of parameters of STGRU is slightly more than that of LSTM, which is one-third to one-half less than STGN.
The optimizer use Adam, a variant of Stochastic Gradient Descent (SGD), which comprehensively considers the gradient's first moment estimation (first moment estimation, the mean value of the gradient) and second moment estimation (second moment estimation), and calculates the update step length of parameters. It can automatically adjust the learning rate, and it is very suitable for large-scale data and parameter scenarios.

Experiments
In this section, we conduct experiments to evaluate the performance of our proposed model STGRU on three realworld datasets. We eliminate user data whose trajectory length was less than 30 in the three data sets and then take 70% of the users as the training set and the remaining 30% as the testing set.
The three datasets do not contain the area information of the track points. This paper divides the area of the track according to 5km × 5km in each area and determines the area where the track points are located according to the latitude and longitude of the track points in the dataset. A sliding window is used to generate samples on both training and test data, and the time interval is randomized within the sliding window to increase the complexity of the trajectory.
For example, if the time of the first track point is 8 : 00 am, the random interval is [7,12,18,24,30,31]. Assuming the random number 3, then the time interval between the second track point and the first track point is 3 × 5 min = 15 min, which the second track point times are 8 : 15 am.

Baseline Methods.
We compare our proposed model STGRU with five representative methods for trajectory prediction.
(i) RNN [9]: it passes the state cyclically in its own network; so, it can accept a wider range of time series structure input and widely used for time series prediction tasks (ii) LSTM [10]: this model is suitable for processing and predicting important events with very long intervals and delays in time series. To a certain extent, the problem of gradient disappearance and gradient explosion of RNN is solved (iii) GRU [11]: a variant of the LSTM model, which has fewer parameters than LSTM and shows better performance on certain smaller and less frequent datasets (iv) HST-LSTM [14]: it integrates spatiotemporal influence into the three gates of LSTM. Since there is no session information in the datasets, its ST-LSTM vision is used here (v) STGN [15]: obtained by enhancing LSTM, introducing two pairs of spatiotemporal gates to capture spatiotemporal relationships

Evaluation Metrics.
In order to evaluate the performance of our proposed STGRU model and compare it with the above five baselines, we used two standard metrics area under curve (AUC) and Recall@K. The trajectory prediction task is essentially a multiclassification task, and AUC metrics can better evaluate the classification effect. Recall@K is defined as the ratio of the number of correct predictions to the total number of predictions. First, all possible regions are arranged in descending order according to their probability. Then, the recall score is calculated as the percentage of the number of times the true region is found among the top K most likely regions. In this paper, use K = 1, 5, 10, 15, and 20 to illustrate different results of Recall@K. U is the set of users, L u represents the set of real regions of user u in the testing data, and P K,u denotes the set of top K predicted regions; the calculation formula for Recall@K is:  Table 1 shows the performance of our proposed model STGRU and the performance of the six baselines evaluated by Recall@K and AUC on three datasets. The hidden state size is set to 32 in our experiment, the number of epochs is set to 200, and the batch size is set to 512. The sliding window size is set to 10, and the random time interval in the Nagoya dataset and the Osaka dataset is 1 to 3. The random time interval of the Tokyo dataset is 1 to 5, because the data density in the Tokyo data set is higher, and needs to increase the complexity of the data by increasing the random interval. To be fair, all baseline experiments in this paper use the same hyperparameter settings. From the experimental results, the following observations can be obtained: The STGRU model we proposed is significantly better than the existing state-of-the-art methods in all indicators of the three datasets. The performance gains provided by STGRU over these five counterparts are about 18.1%-110.2%, 5.7%-74.7%, and 3.7%-24.1% in terms of Recall@1 metric in Nagoya, Osaka, and Tokyo datasets, respectively. The results show that the mechanism of modeling the road network structure in STGRU can better model user behavior patterns, modeling short-term temporal and spatial contexts improves the effect on strong real-time data, and is effective for the task of trajectory prediction. That is because the added road network gate is combined with the update gate to integrate the short-term road network characteristics into the model, and the reset gate is combined to integrate the long-term road network characteristics.
In addition, the performance of RNN on the three datasets is better than LSTM. This is because RNN has the characteristics of short-term memory. The closer the time, the greater the weight of track points. Even if the random intervals are added, the obtained samples still have strong realtime performance; so, the performance of RNN is better. Similarly, GRU is superior to LSTM in modeling strong real-time data. The performance of HST-LSTM and STGN is better than the above three models, which proves the importance of spatiotemporal factors to track prediction. Among them, the performance of STGN is better than HST-LSTM, which proves that the method of obtaining spatiotemporal effects through specific gates is more effective than improving on the basis of LSTM gates. The reason may be the increase of the parameters.
In the three datasets, each dataset covers a total of 6 days of trajectory data in an area of Japan, and the time interval between adjacent track points is 5 minutes. Each area can be divided into about 5000 to 10000 regions. Taking the NPF dataset as an example, the number of regions is about 5000. It can be calculated that the size of the spatiotemporal matrix of the dataset is about 5000 × 3000. However, the number of trajectory points in the NPF dataset is only one million. After removing the repeated spatiotemporal regions, the size of the track point coverage matrix is less than 1% of the size of the spatiotemporal matrix of the dataset. RNN, LSTM, and GRU are directly trained on the spatiotemporal matrix, which will lead to the problems of data sparsity. STGRU adds constraints between track points through time intervals, distance intervals, and road network structure. While the STGRU is being trained, only the local area covered by each sample needs to be considered, which greatly alleviates the problems of data sparsity in the dataset and can better model user behavior patterns compared to the above three models.

Impact of Parameters.
In the standard RNN, different cell sizes will lead to different performance. They studied the impact of cell size on STGRU. Observe the impact of different cell sizes on model performance by changing the cell size to 32, 64, 128, 256, and 512. It can be seen from Table 2 that increasing the cell size to a certain extent can improve the performance of the model. Large cell size will increase the training time and result a decline in performance. When the number of model units is determined, the cell size determines the complexity of the model, and a larger cell size may fit the data better.

Effectiveness of Time and Distance
Gates. STGRU has a time gate and a distance gate combined with update gate to capture short-term dependencies. The effectiveness of time and distance gates on modeling time and distance intervals is important. The time gate and distance gate can be closed by set T t = 1 and D t = 1. In order to eliminate the interference of road network gates, the road network gate in STGRU was also closed. There are three sets of experiments, respectively, closing the time gate and the distance gate and closing both two gates at the same time to compare and verify the effectiveness of the time gate and the distance gate.
From Table 3, it can be found that time gate and distance gate have similar importance on the datasets. Compared with GRU, the performance improvement of GRU + D t + T t on the four evaluation metrics is 27.67%, 18.04%, 11.22%, and 7.23%, respectively. And the performance difference between GRU + T t and GRU + D t is very small, indicating that the distance interval and time interval have similar effects on modeling behavior patterns. And the performance improvement of GRU + D t + T t is small, indicating that there is a large degree of overlap in the characteristics of the time interval and the distance interval on the testing dataset.

Effectiveness of Road Network
Gates. There is a road network gate in STGRU, which is integrated with the update gate and the reset gate to capture long-term and short-term road network dependencies. The motivation of this group is  to study the role of road network gates in the update gate and reset gate through experiments. The road network gate can be closed by setting Road t = 1 inR t andZ t , respectively. The effectiveness of the road network gate in capturing long-term and short-term dependencies can be verified by setting up three sets of experiments, namely, closing all road network gates and closing a single road network gate.
As shown in Table 4, the performance of closing a single road network gate is not as good as closing all road network gates. This may be due to the long-term features and shortterm features of the road network structure that need to be used together. The closing of a single road network will cause the road network information to be invalid for the prediction result. At the same time, some parameters are used to model the characteristics of the road network, which will cause the performance of the model to decrease. Therefore, the performance of closing one road network gate alone is almost the same. 4.5.3. Impact of the Sliding Window Size. In our experiment, samples are obtained through a sliding window. The size of the sliding window limits the trajectory length of a single input. In order to compare the performance of our model in different size sliding windows, the sliding windows are set to different lengths to observe the impact, respectively, 10, 15, 20, 25, and 30. In order to ensure the number of samples under a larger sliding window size, it conducts experiments on the Tokyo People Flow dataset, because the dataset has the longest average trajectory length.
The size of the sample length determines the length of the model unit, as well as the parameters of the model. As shown in Table 5, as the length of the sliding window increases and the amount of model parameters increases, the overall performance of the model has a certain improvement. When the sliding window size is set to 30, the model complexity is 3 times that when the sliding window size is 10, the performance improvement of the four metrics increases are 8.68%, -4.85%, 11%, and 3.37%, respectively. Although the increase of the sample length can improve the performance of the model, it is necessary to consider the actually applica-tion scenarios of the gather prediction task. The sample length within 10 is more meaningful, and this is also the main reason that the sliding window size is set to 10 in our comparison experiment with the baselines. 4.5.4. Impact of the Random Interval Size. Another important parameter is the size of the random interval. Increase the complexity of the trajectory samples by randomly sampling of the time interval between the track points in the sliding window. For comparison, the impact of different random intervals on the complexity of the trajectory sample and the performance of the model sets up different random intervals on the Tokyo People Flow dataset for comparison experiments and sets the random interval sizes to 3, 5, 7, 9, and 11, respectively. Choose the Tokyo People Flow dataset which the continuity in the dataset is strong, and a certain degree of complexity is required to better reflect the purpose of the experiment.
According to Table 6, it can be see that the model performance is the best when the random interval size is 5 and 7, and the random interval size that is too large and too small will cause the model performance to decrease. On the other two datasets, the model performance is better when the random interval size is 3, which is why the three data sets use different random intervals in the comparison experiment with the baselines. Using random intervals can make the sample closer to the data in the real world. 4.6. Case Study. The purpose of verifying the STGRU model is that it can process and predict short trajectory data and long trajectory data, which conducted two sets of experiments with the baseline models. If the user's trajectory data is scarce, it means that it can hardly understand the user's behavior pattern, which requires higher performance of the model. The experiments are based on the Tokyo People Flow dataset, taking data with a track length of less than 30 for calculation, without random intervals, and use recall@k as the evaluation metrics. As shown in Figure 4, STGRU has the best performance on recall@1 and recall@5, which proves that STGRU can better handle sparse data.
In another set of experiments, data with track length greater than 200 was obtained and followed the parameter settings of the comparative experiment. As shown in Figure 5, STGRU is also superior to all baselines on long trajectory data, which proves that STGRU can extract and use long-term features very well, especially the effectiveness of long-term road network features for modeling strong realtime data.

Conclusion
In this paper, we propose a Spatio-Temporal Gate Recurrent Unit (STGRU) model by enhancing Gate Recurrent Unit for gather prediction. In STGRU, the time gate and distance gate are introduced to model the time interval and distance interval between consecutive trajectory points, which are essential to describe the short-term behaviors of users, and the road network gate is introduced to model the long-term and short-term road network structure. We believe that the geographical environment represented by the road network structure is very important for both the short-term and long-term behaviors of users. The three gates are combined with the update gate in the GRU to extract the user's shortterm behaviors pattern. Only the road net gate and the reset gate in the GRU are combined to extract long-term behaviors patterns of users. Experimental results on three real-world datasets prove the effectiveness of our model, which is better than the latest methods.
In future work, we will further incorporate the structured representation of road network information into the model to further improve the aggregation prediction performance.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that they have no conflicts of interest.