Location Prediction on Trajectory Data: A Review

: Location prediction is the key technique in many location based services including route navigation, dining location recommendations, and trafﬁc planning and control, to mention a few. This survey provides a comprehensive overview of location prediction, including basic deﬁnitions and concepts, algorithms, and applications. First, we introduce the types of trajectory data and related basic concepts. Then, we review existing location-prediction methods, ranging from temporal-pattern-based prediction to spatiotemporal-pattern-based prediction. We also discuss and analyze the advantages and disadvantages of these algorithms and brieﬂy summarize current applications of location prediction in diverse ﬁelds. Finally, we identify the potential challenges and future research directions in location prediction.


Introduction
Urban planning, relieving traffic congestion, and effective location recommendation systems are important objectives worldwide and have received increasing attention in recent years.Spatiotemporal data mining is the key technique involved in these practical applications [1][2][3] .Trajectory data brings new opportunities and challenges in the mining of knowledge about moving objects.To present, many researchers have used trajectory data to mine latent patterns that are hidden in data.These patterns can also be extracted for the analysis of the behavior of moving objects.Location prediction, as the primary task of spatiotemporal data mining, predicts the next location of an object at a given time.In recent years, researchers in location prediction have Ruizhi Wu, Guangchun Luo, Junming Shao, Ling Tian, and Chengzong Peng are with School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China.Email: ruizhiwuuestc@gmail.com; gcluo@uestc.edu.cn;gcluo.uestc@gmail.com;junmshao@uestc.edu.cn;lingtian@ uestc.edu.cn;prescott0307@outlook.com.To whom correspondence should be addressed.Manuscript received: 2017-11-20; accepted: 2018-01-10 made much progress.For instance, early studies traced student ID cards to identify frequent temporal patterns and used these patterns to predict their next location [4][5][6][7] .Since then, location prediction has had a wide range of applications in daily life, e.g., travel recommendation, location-aware advertisements, and early warning of potential public emergencies, to mention a few [8][9][10] .Location prediction typically must employ many techniques, including trajectory data preprocessing, trajectory clustering, trajectory pattern mining, trajectory segmentation, and trajectory semantic representation.In this article, we review the field of location prediction, its basic definitions, typical algorithms, model evaluations, and diverse applications.Our objective in this review is to present a comprehensive picture of location prediction.
The remainder of this paper is organized as follows: in Section 2, we introduce the basic concepts of location prediction, including the different sources of trajectory data, the general prediction framework, challenges in location prediction, and common trajectory data preprocessing methods.In Section 3, we introduce some common trajectory data preprocessing methods.Section 4 comprises the core of this review, in which we describe different models and briefly introduce each of their motivation, basic idea, and key techniques.In Section 5, we provide some public data sets, and typical evaluation strategies used in location prediction.Finally, in Section 6, we introduce some real-world applications and future directions.We conclude our overview in Section 7.

Preliminary
Trajectory data characterizes the locations and times of moving objects.Formally, p D .x;y; h; t / represents a trajectory data point, where x and y are the latitude and longitude of a given moving object, respectively.h is the altitude, and t is the time stamp (in many realworld applications, h is ignored; thus p D .x;y; t / is commonly used).As such, trajectory data is composed of a sequence of trajectory data points in chronological order.Formally, a trajectory T ra is often represented as: T ra D p 1 ; p 2 ; :::; p i ; :::p n .In the following subsection, we introduce different types of trajectory data sources, their unique characteristics, and the corresponding challenges in location prediction.

Trajectory data sources
There are many different types of trajectory data in realworld scenarios.Zheng [11] classified trajectory data as being either actively or passively recorded, depending on the derivation of trajectories.We briefly introduce these two types of trajectory data below.
Active recording trajectory data: People actively record their locations when they login to social networks or travel to places of interest and share their life experiences.Typical data types include check-in data (e.g., Twitter, Weibo, QQ, etc.), and location-based data such as travel photos.
Figure 1 shows the correlation of users and locations in social networks.In Flickr, a sufficient number of geotagged photos can be formulated Fig. 1 User-location graph for a Location-Based Social Network (LBSN), comprising user and location correlations [14] .
as a trajectory, whereby each photo is associated with a location tag and a time stamp.Due to the random behaviors of users, active recording data is typically characterized by its sparsity.
When mining this trajectory data, additional social information is usually added.
Passive recording trajectory data: With the development of positioning techniques, many moving objects are equipped with positioning position devices that record location information.These include global positioning systems GPS in vehicles and radiofrequency identification devices for tracing objects.Typically, these devices automatically record huge volumes of trajectory data points [12,13] .Transaction records or the Internet trajectory of human beings can also be viewed as trajectory data, since locations and times are recorded.

Challenges in location prediction
In contrast to traditional data, the unique properties of trajectory data (e.g., different sampling rates, different lengths, and sparsity) mean that location prediction faces many challenges.

Randomness of movement behaviour
Since the current location of a user is related to that users visited location history, trajectory data is context-sensitive.However, in contrast to other data, mobility patterns are difficult to identify.Features in trajectory data are fuzzy, and mobility patterns have no exact standards.To generate patterns, researchers often mine mobility patterns based on association rules, time period, and transition probability between locations.Sometimes patterns can represent user movement behaviour, but cannot describe the moving process of users due to the inherent mobility randomness of mobility.

Time sensitivity
Trajectory data is time sensitive.Since the moving speed of a user can be fast, a user often visits many locations in a short period of time.The time sensitivity property makes it is difficult to handle time.Traditional methods often use a time window, but this strategy is not always optimal, because (1) users move quickly and randomly, and a time window cannot capture changes in the trajectory data; (2) the length of a time window is difficult to establish; and (3) a time window is a discrete way to handle time and it is difficult to determine the correlation between time and location.

Cold start and sparsity problems
If a user has no trajectory history, it is difficult to establish predictors regarding his future location, which is known as the cold start problem.If a user has only a few visiting locations, it is also difficult to establish predictors of his future location, which is a sparsity problem.Cold start and sparsity problems are prevalent in prediction applications, especially those using actively recorded trajectory data.

Heterogeneous data
The sources of trajectory data are diverse, e.g., taxis, buses, people, etc.These moving objects often have different sampling rates and movement patterns.In many scenarios, datasets also include social relations or short messages.As such, heterogeneous data sources represent another challenge in location prediction systems.

Location prediction framework
Human trajectory data reveal the movement preferences and behaviours of people in daily life.
Trajectory data mining involves trajectory data processing, management, and pattern mining from past trajectories [11] .Location prediction, as the primary task in trajectory data mining, learns the movement patterns of moving objects based on their past locations and the past and then forecasts future locations.In traditional trajectory mining tasks, trajectory pattern mining, trajectory similarity measurement, and trajectory anomaly detection are closely related to location prediction.The objective of location prediction is to predict the next location and/or the next visit time to a given location.The former considers trajectory data as a spatial sequence, whereas the latter regards trajectory data as a spatiotemporal sequence.
Figure 2 shows the framework of the location prediction process, which has three steps.(1) Due to trajectory data being sampled from various positioning devices, its quality is low, so data preprocessing is necessary.Trajectory data preprocessing involves noise filtering, data cleaning [15] , trajectory data compression [16][17][18] , even some special preprocessing such as trajectory segmentation [19][20][21] , trajectory semantics [13,22,23] or map matching [24][25][26] .(2) Learning from past trajectory data is a modeling process, a key aspect of location prediction, in which the movement of a moving object is modeled.The performance of a location prediction algorithm mainly depends on the proposed model.(3) Location prediction and results evaluation are the indispensable parts of an established location prediction system.
In this section, we review some basic concepts of location prediction, including the data sources and challenges associated with location prediction.Finally, we provided an overview of the location prediction framework.

Trajectory Data Preprocessing
Based on the above general framework, we briefly introduce some common trajectory data preprocessing methods in this section.First, we present some common basic data cleaning methods, including noise filtering, stay point detection, and trajectory compression.Then, we introduce some special trajectory data preprocessing procedures, which typically involve Places Of Interest (POIs) identification, trajectory segmentation, and trajectory semantics.Finally, we introduce methods for extracting features from trajectory data.noise filtering, stay point detection, and trajectory compression.

Traditional trajectory data preprocessing often involves
Noise filtering addresses data inaccuracy problems when the sampled trajectory data from positioning equipment contains unexpected mistakes, such as sensor noise.Existing methods are mainly categorized into three types: mean (or median) filters, Kalman filters, and particle filters.
The stay point is a trajectory point in which the geographical position does not change over a relatively long period of time.Generally, stay points usually have a special meaning to the moving objects, such as a work place, restaurant, or nest for animals.The earliest stay point detection algorithm was proposed by Li et al. [27] and Zheng [28] .Many stay point detection algorithms are based on the concepts of density and nearest neighbours.
To present, the volume of trajectory data is rapidly increasing due to the popularity of positioning devices.However, a lot of trajectory data suffers from high costs of storage and analysis.Trajectory data compression is used to extract key information from trajectory data, including some key points in a trajectory to briefly represent the trajectory.Existing methods can be divided into three categories: off-line, on-line, and semantic compression.
Table 1 summarizes some of the main data cleaning methods.For details, the reader is referred to the corresponding references.

Preprocessing for location prediction
After data cleaning, data preprocessing is usually necessary in location prediction.For example, the goal in most applications is to determine the next location at which people will arrive, rather than the specific GPS coordinates.In this scenario, trajectory preprocessing mainly includes map matching, POIs identification,

Data cleaning
Method Mean (or median) filters [11] Noise filtering Kalman filters [15,29] Particle filters [30,31] Density clustering [32] Stay point detection Nearest neighbour [33,34] Content-based [35] Off-line compression [36] Trajectory compression On-line compression [17,37] Semantic compression [13,28] trajectory segmentation, and trajectory semantics.Urban trajectory data can be used to map a road network [38] .As such, it is beneficial to analyze the paths taken by people, and this also provides a new representation of the trajectory data, i.e., a graph.Map matching techniques map trajectory data onto a road network.Current mainstream matching algorithms are either context-based [39,40] or integrate additional information [41,42] .
POIs are great concern in urban places, and are also referred to as significant locations.POIs denote locations frequently visited or high-density trajectories near locations.Currently, POIs data can be divided into two types.The first type includes well-known public places like malls, government offices, and transportation centers, to mention a few.Unfortunately, this type of data is often scarce.The second type of POI data is high-density trajectory data, in which POIs are often invisible and a cluster-based algorithm or hierarchical method must be utilized to identify the POIs. Figure 3 outlines the process of identifying significant locations (POIs) via hierarchical clustering methods [43] , where X is the center of the cluster, and the dots indicate GPS data.In the figure, the white dots are the GPS data in a cluster, and the dots within the dotted line represent data in the previous cluster.The basic idea of clustering algorithms is to mark a place and its radius.GPS data points within this radius are recognized as the same POI, and the central point, which is new, represents the mean of these GPS data points.This process is repeated until all points within this radius no longer change.We note that POIs are different from stay points.A stay point is the cluster point of a trajectory, whereas POIs are key places of interest.
In some scenarios, since the local trajectory is Fig. 3 Process for identifying significant locations using cluster methods [43] .
more important than the whole trajectory, trajectory segmentation is introduced.Trajectory segmentation methods involve finding the key point, and minimizing the specific cost function or rule-based methods.Early studies identified an inflection point or angle-change distinct point as the key point.Since then, more mature techniques have been proposed.For example, Lee et al. [19] proposed a trajectory segmentation approach using MDL (Minimum Description Length) to minimize the cost of coding this trajectory.
Trajectory segmentation is essential in local trajectory trend analysis and local trajectory clustering.
Trajectory data are closely related to the behavior of moving objects, especially humans [13,44,45] .Trajectory semantics identifies semantic information about object movement, to explain human behavior, identify the semantic pattern, and establish an outlook for exploring trajectory data.A number of researchers have enriched semantic information into trajectory via data mining.Trajectory semantics facilitates location prediction by its ability to interpret human behaviour.For more details, the reader is referred to the review of trajectory semantics [46] .
In this subsection, we introduced some new trajectory representation techniques, specifically, map matching of trajectories as a graph, the representation of trajectories as sequential POIs, trajectory segmentation by splitting trajectories into segments in a set, and trajectory semantics that consider a trajectory in a semantic representation.

Trajectory data features
In contrast to other data, it is a non-trivial task to extract features from trajectory data.Some researches define a start point, end point, moving speed, and time length as features.Wang et al. [47] proposed new features in trajectory data, consisting of spatial mobility patterns, text content, individual temporal patterns, social relationships, collaborative filtering, and heterogeneous mobility datasets.Social relationships describe the social networks in Location Based Social Network (LBSN), in which the extracted features include the number of locations, user entropy, location entropy, and visitation ratio.Figure 4 shows some features in the Gowalla and Jiepang datasets [48] .There are various types of heterogeneous mobility datasets including bus, taxi, check-in, and life mobility datasets.Spatial mobility patterns, individual temporal patterns, and text content are extracted from individual trajectories via data mining or machine learning and collaborative filtering techniques are used to find similar users or locations.These features provide unique insights for exploring trajectories.

Location Prediction Models
In this section, we provide a summary of current location prediction algorithms, introduce some representative studies, and discuss the categorization of  [48] .
prediction models.

Location prediction algorithms
Here, we introduce some representative location prediction algorithms.Table 2 lists the main symbols used.

Content-based methods
Content-based methods learn the content correlation or location transition probability, based on the assumption that the current user location is related to the previous location.To this end, researchers often construct a data structure to store content, then match the content to predict location [49,50] .The Markov model is a typical strategy that uses content information to predict a future state.
Song et al. [43] studied mobile phone positioning data via Wi-Fi signal, using a two-year trajectory trace of more than 6000 users on Dartmouth College's campus-wide Wi-Fi wireless network.The authors compared two prediction methods, Markov-based and compression-based location predictors.A Markov model is a typical sequence analysis method in which a Markov chain is used to model a user's location history based on the user's transition probability from one location to another.In this model, it is assumed that a user's next location depends on his current location.If a user arrives at an associated location, he is likely to visit the identified location.A k-order Markov model enables the prediction of a user location based on previous k 1 locations.If a user is currently at the n-th location, his k-order recent locations are .ln kC1 ; :::l n /.Formally, if we assume that a user trajectory history is l 1 ; l 2 ; l 3 ; :::; l n kC1 ; :::; l n , the k-order context is c D .ln kC1 ; :::; l n / and L is the location sets, then the Markov model is defined as shown in Eq. ( 1): where P .lnC1 D l j:::/ is the probability of the user arriving at the next place l .The first line in Eq. ( 1) is the Markov assumption, and the second line indicates that there is the same probability of arriving at a different location if the context is similar.Moreover, if a location has no historical context, its arrival probability is equal to zero.The other content-based method uses compressionbased location predictors with a popular incremental parsing algorithm for text compression.In this method, a trajectory is first partitioned into distinct subtrajectories st 0 ; st 1 ; :::; st m .Let l 0 D , and for j > 0, sub-trajectory l j removes its last location that is the same as l i , where 0 < i < j and st 1 st 2 :::st m D T ra.For example, for a trajectory gbdcbgcef bdbde, after partitioning by the above rule, the sub-trajectory sets are g; b; d; c; bg; ce; f; bd; bde.Then, a tree is built to store these sub-trajectories, in which the root nodes are , and the child node is sub-trajectories and each with a location occurrence number.Figure 5 shows an example.
For each location l in the location set, Eq. ( 2) is used to compute the occurrence probability, where N.st m ; l / denotes the number of st m occurring as a prefix for st m ; l in the sub-trajectory data sets.
Unlike Markov-based predictors which have a fixed length related to the k location, compression-based predictors depend on the occurrence number of location prefixes.Song et al. [43] compared the performance of these two methods in predicting the location of users via user Wi-Fi data.Based on their experimental results, the authors found low-order Markov methods to be more accurate than complex compression-based methods.Next, research is conducted to determine the Fig. 5 Example of a tree storage sub-trajectory based on compression [43] .
transition probability between different locations.The transition probability is used in both Markov-based and compression-based predictors, and is a prototype of the Spatial Mobility Pattern (SMP) feature that provides a new way of investigating changes in movement patterns over time.
Similar to the findings of Song et al. [43] , Ashbrook and Starner [51] proposed a location prediction model based on a Markov chain, which they called the user location pattern ULP model, and which has wide application in many scenarios.In this model GPS data is transmitted to significant locations.These locations are then input to a Markov model to predict locations.The difference between this model and that proposed by Song et al. [43] is that ULPs are applied to both single and collaborative users.The ULP model identifies significant locations by clustering.The example in Fig. 3 shows the use of a location clustering algorithm to identify significant locations.Next, a Markov model is built to predict the next location.Figure 6 shows a Markov model of the travel route between home, the Centennial Research Building (CRB), and the Department of Veterans Affairs (VA).
Occasionally Markov models are combined with recommendation methods to establish a location prediction system.For example, in the Collaborative Exploration and Periodically Returning (CEPR) model, Lian et al. [48] solved the problem of location prediction based only on context.Human movement patterns reveal the human preference for finding fresh locations, as do individual mobility trajectories.Lian et al. [48] proposed the CEPR model to predict user locations based on this human movement tendency.First, the authors studied unique characteristics in human movement behaviour and proposed a novel solution: the use of exploration prediction, for estimating whether Fig. 6 Examples of different location state transitions in a Markov model [51] .a visited location is new.Lian et al. [48] used a binary classification system to classify a movement behaviour as either an exploration of or return to a visited location.Their experimental results had a 20% classification error rate on two check-in datasets, comprising 6 10 6 (million) and 3:6 10 7 records, respectively.Figure 7 shows the probability of return and the ratio of novel check-in to all check-in locations in the two datasets.
Figure 4 shows the correlation between novelty and various features.Using this exploration classification, the CEPR model integrates location prediction and location recommendation algorithms to generate a location predictor.A Markov model is used for location prediction and a location recommendation is used to solve the problem of predicting novel locations.
Equation ( 3) below shows the CEPR framework, where P r.Explore/ 2 OE0; 1 is the exploration rate based on a binary classification, P r .l/ is the probability that a user arrives at a location, based on the prediction algorithm, and P n .l/ is the probability of a user exploring a novel location, based on the recommendation algorithm, P r .l/;P n .l/ 2 OE0; 1. P r.EXplore/ appears to be a soft switch for controlling P r .l/and P n .l/.P .l/D P r.Explore/P n .l/C.1 P r.Explore//P r .l/ (3) Content-based methods determine the correlation between locations.This method has two disadvantages: (1) if a user does not visit some locations, those locations do not appear in the user's prediction list and (2) similar content will predict the same location.

Distribution-based methods
Distribution-based methods model user movement as a distribution with location and time as two random variables.In this method, the probability of a random variable is computed and then ranked to predict a (a) (b) Fig. 7 (a) Probability of return; (b) ratio of novel checkin locations to all check-in locations in Gowalla and Jiepang datasets [48] .
location.Cho et al. [52] discovered that human movement in daily life is influenced by the factors, of geographical limitations and social relations.In their collected data set, the authors identified a peculiar phenomenon whereby the social network structure has little effect on the spatial and temporal patterns of human shortrange travel.Furthermore, social relationships account for just 10%-30% of people's daily life movement, and 50% of their periodic activity.The authors proposed a location prediction method based on these findingsthe Periodic and Social Mobility Model (PSMM)which describes human mobility based on periodic short-range travel and social network structures.The authors determined the home distance distributions of friends and all users, the distance between 200 large cities, and the probability of friendship as a function of distance, the results of which are shown in Fig. 8.
Figure 9 shows that the work place and home are the primary places in human daily life, and that most people visit their work place during the workweek daytime hours and their place of residence on work nights and weekend day times.Based on this finding, they proposed the Periodic Mobility Model (PMM) for predicting future user location states (based on partitions of all locations visited by users, both workand home-related).First, PMM infers the geographic location centres of two latent locations for each user and  [52] .temporal distribution of check-ins when at home or work place, denoted by red and blue lines, respectively [52] .
then models them based on their Gaussian distribution.
Then, the PMM model determines the probability of whether a user is at home or the work place as a function of time of day.Equations ( 4) and ( 5) are used to compute the probability of the user state distributions based on their Gaussian distributions: and then, where is the average time of day the user is at a location state, and Â is the variance in time of day.
To integrate social information into the PMM, Cho et al. [52] introduced a check-in classification (z u .t/) to PMM. z u .t/ D 1 to determine whether the checkin location is related to the user social network, and vice versa.The authors first determined the correlation between travel distance and social information, and then modeled the spatiotemporal pattern and social information to predict location.However, this method lacks any description of spatiotemporal patterns and social structure.  [53,54] frequency patterns [35] , integrated data patterns [55] , and periodic patterns [26] .These methods extract spatiotemporal patterns from trajectories to predict locations.
Here, we describe two predictor methods based on spatiotemporal patterns.Monreale et al. [56] proposed a dynamic pattern model called the T-pattern which is a dynamic mining method for extracting GPS trajectory data [53] .First, the authors used nearest neighbour methods to dynamically analyze the density distribution and extracted a range of interesting dense cells.Then, they constructed temporally annotated sequences and computed its relationship to the spatiotemporal pattern within a given time tolerance.Figure 10 shows an example of a T-pattern study.After extracting the Tpattern, Monreale et al. [56] stored it in a prefix-tree structure, called the T-tree.In the prediction process, this method computes a special score that includes the path and punctuality scores.Next, the space tolerance, time tolerance, path score, and punctuality score are integrated to make a prediction.
Lee et al. [57] also extracted spatiotemporal patterns to distinguish a T-pattern called the Gapped SpatioTemporal-Periodic (GSTP) pattern, whereby gapped sequence mining is used to build a patternbased location prediction system.Figure 11 shows the training process of the proposed framework, for which the data source is smartphone log data.After data cleaning, the authors extracted the GSTP.This Fig. 10 Example of T-pattern in temporally annotated trajectory [53] .Fig. 11 Extracting process of spatiotemporal pattern and predictor training process [57] .
pattern extraction process involves computing the time range in which users stay at a unique place and the stay probability, based on the SpatioTemporal-Periodic (STP) pattern and use the GSTP to compute the time gap and transition probability between different locations.After constructing the GSTP trajectory, this trajectory is used to forecast the next location.This simple pattern describes the time gap and time period of a user's movement.The disadvantage of this pattern is that it is based only on observed spatial and temporal information and lacks any detailed description of movement patterns, such as the location visitation frequency.
In contrast to the GSTP, Monreale et al. [56] considered both time and space tolerance, and used a unique data structure to store the T-pattern, which is beneficial in predicting locations.Although pattern-based methods are common, the difficulty of extracting meaningful patterns is a non-trivial task.Pattern-based methods also seem overly exacting and lack diversity in visitation locations.

Preference-based methods
Mobility preference is an important factor in predicting user location.Many studies have focused on user mobility preferences as a basis for predicting location.There are a number of methods available for determining user preference, with matrix-based methods being the most popular.
User location history can be used to generate a matrix and then matrix factorization can be used to capture user movement preferences.A tensor is an extension of the matrix.Bhargava et al. [58] used tensor factorization methods in multi-dimensional collaborative filtering to predict location via user profiles, user's short message in social network, and user location and temporal information.Human movement is related to user preferences, user activities, and user spatiotemporal patterns.The use of additional information enables the improvement of the overall prediction outcome.Multisource data fusion and collaborative filtering are key techniques.To solve the sparsity problem in check-in data, Bhargava et al. [58] proposed tensor factorization to complete multi-dimensional collaborative filtering tasks.They jointly analyzed the constructed tensor and matrices and formulated an objective function as shown in Eq. (7).
where W is a weight tensor, X is a tensor, U is a user matrix, L is a location matrix, A is an activity matrix, and T is a time matrix.k k 2 is Frobenius norm, ı is the outer product, and 1 5 are the model parameters.This method integrates user, activity, location, and temporal information to predict location by tensor factorization.The drawback of this method is that it is typically both resource and time consuming.
Single type mobility datasets often contain limited information.For example, check-in datasets include only check-in behaviour and social relationship.Heterogeneous mobility datasets include bus, taxi, daily life mobility, and so on.The use of various data sources provides new perspectives regarding movement patterns.Wang et al. [47] proposed a Regularity Conformity Heterogeneous (RCH) model for analyzing heterogeneous datasets and used a gravity model to determine the spatial influence.The RCH model assumes human movement to be affected by both regularity term and conformity term, and splits the geospatial space into many grid cells, as shown in Fig. 12.
Figure 12 associates each venue with a grid cell, in which a plus sign means that the user has visited this venue.RCH computes the probability of user u i visiting v j , denoted as P r.v j ju i /, based on three factors.The first factor is the visiting frequency of a grid cell as shown in Figs.12a and 12c.The second factor is the transition probability of the geospatial influence between different grid cells as determined by the gravity model.Figures 13a and 13b show the spatial influences of a bar district and an IT Fig. 12 Method for determining regularity term [47] .
(a) (b) Fig. 13 Spatial influence of a location using a gravity model based on taxi, bus, and check-in datasets.(a) Point A is a bus district, (b) point B is an IT district [47] .district, respectively.In contrast to traditional methods, the gravity model enriches the life and commuting information via training by heterogeneous trajectory data, such as taxi, bus, and check-in data.The third factor is the venue visiting frequency.The regularity term is computed using Eq. ( 8) below.
RCH describes conformity term using a time changing matrix factorization model, as shown in Eq. ( 9): By incorporating regularity term and conformity terms, the RCH model is defined as follows. R The final objective function of RCH is that expressed by Eq. (11).By modeling the regularity term and conformity term, RCH considers not only individual preferences based on regularity term, but also the influence of others who are similar to the user.Then, it incorporates heterogeneous mobility data for the first time into its spatial influence analysis.min The above approaches capture user or public preferences via matrix or tensor factorization.Like pattern-based methods, preference learning can extract user movement behaviour.However, their disadvantages are also similar.Most preference-based methods ignore the evolution of preference and the randomness of mobility.

Social relation-based methods
Social relation-based methods utilize social relations to model user movement, and then use these relations to infer future visiting locations of a given user.Previous works have often depended on historically visited locations.However, when user data contains few previous locations, prediction methods cannot generate any useful result.The social relation feature can be used to alleviate the effect of cold starts and the data parsing problem in location prediction models.
Gao et al. [59] proposed a geo-social correlation (gSCorr) model to solve the cold start problem based on social information.The GSCorr model determines the correlation between social network and geographical distance, and defines a complex matrix of four relationships between social information and distance, as listed in Table 3, where F is a friend in a social network, F is not an observed friend in a social network, D is a long geographical distance, and D is a short distance.S F D describes a user's friends in a social network who live a short distance away.Figure 14 shows a user's check in behaviour in different social correlation aspects.GSCorr predicts the probability of a user checking-in at a new location P t u .l/(Eq.( 12)), where 1 4 are four distributions that govern Table 3 Geo-social correlation [59] .14 Influence of new check-in behaviour on geo-social correlation [59] .
the strength of diversity in the geo-social correlation, P t u .ljSx / is the probability of a user checking-in at a location in the gSCorr model, and S x is a user's geosocial circle.
In addition to studying the correlation between social circle and geographical distance, Guo et al. [60] also paid attention to determined check-in patterns based on check-in history.The authors obtained user checkin patterns based on the power-law distribution and short-term effect to explain user check-in behaviour in a social network.To do so, they utilized the influence of social and historical ties on user checkins.They assumed (1) user check-in behavior to follow a power-law distribution and that (2) the check-in history trajectory has a short-term effect.Based on these assumptions, they proposed a social historical model that uses a Hierarchical Pitman Yor (HPY) process [61] to model a user's historical check-in sequence of LBSNs.The HPY extracts the power-law distribution and short term effect and formulates the user's next check-in location based on that user's check-in history, as shown in Eq. ( 13), where G is the next check-in location, d 2 OE0; 1 is a discount parameter for controlling the power-law property, is a parameter, and G 0 is the base distribution [61] .The parameters in Eq. ( 13) can be inferred based on the observed check-in history.
To introduce social ties to the model, parameter Á is introduced to denote social influence.Equation ( 14) is a social historical model that integrates social ties, where P i H .c nC1 D l/ is the probability of user check-in at location l based on the history trajectory, P i S .nC1 D l/ is the probability of check-in at location l based on user social ties, and i is the user u i : where we can compute the probability of social ties P i S .nC1 D l/ using Eq.(15).N .ui / is the friend set of a user u i , and P i;j HP Y .cnC1 D l/ is the probability of a user u i check-in location l, where the probability is computed using the HPY process and user check-in history.Equation ( 16) is used to compute P i H .c nC1 D l/.
Sadilek et al. [62] proposed the Friendship and location analysis and prediction (Flap) model to infer user location despite users not having released their private data.The authors built a Bayesian network for modeling the effect of a user's friends movement patterns to realize two functions.First, Flap can predict the social relationships between user movement patterns.Since Bayesian networks are a probabilistic, Flap is a graph network based on probability inference.Second, the use of social relationship improves the results of user location prediction.Sometimes, despite the fact that some users keep their locations private, Flap can infer their location based on the location information of their friends, which is helpful in solving "cold start" problems.Flap predicts location using a sequence of visited locations and temporal information about a user's friends.Flap output is also a sequence of locations that a user has visited over a given period of time.Figure 15 shows a dynamic Bayesian network for modeling locations visited by a user.Flap learns and infers the parameters in a dynamic Bayesian network and then predicts user locations.
Social relation-based methods focus on the effect of social relations on location prediction.These methods can also alleviate the problems of cold start and data sparsity.Social relations represent additional information that can help improve location prediction Fig. 15 Two consecutive time slices of a Flap dynamic Bayesian network for modeling user movement pattern based on n friends [62] .performance.

Time-dependent methods
Most location prediction algorithms focus on geography or social characteristics, while ignoring temporal information.Some algorithms build a time-dependent model, but using a stochastic process is a better choice.Stochastic process models utilize time factors as random variables and embed location factors into the stochastic process.A typical example is the point process.
Du et al. [63] proposed the Recurrent Marked Temporal Point Process (RMTPP) to simultaneously model visiting time and location.The basic concept of RMTPP is to model movement history using a nonlinear function.RMTPP also uses a recurrent neural network to automatically learn a representation of influences from a user mobility history.Figure 16 shows the RMTPP framework.
For a given check-in trajectory T D .tj ; l j / n j D1 at the j-th check-in, location l j is first embedded into a latent space.Next, embedded vector and temporal information are fed into the recurrent layer and RMTPP automatically learns a representation of the visiting history.The output layer infers the next location and time depending on the representation.The key RMTPP process occurs in the recurrent layer, which undergoes a recurrent temporal point process.Figure 17 shows the recurrent temporal point process, which obtains a general representation of the fuzzy relation between temporal and spatial information.
Equation (17) shows the hidden layer, which has three parts.Here, y j is the location information, t j is the temporal information, W is the embedded matrix, h j is the representation of the j-th check-in event, and b h is the base term.
Fig. 16 RMTPP framework [63] .Fig. 17 Two consecutive time slices in the hidden layer of the recurrent marked temporal point process [63] .
The core of this model is the hidden layer representation and conditional intensity function of the temporal point process .t/, as shown in Eq. ( 18).The conditional intensity function consists of three parts, past influence, current influence, and base intensity.RMTPP uses .t/ to model user check-in behaviour with temporal information being the key factor, thereby differing from spatial-based algorithms.
In addition, Zarezade et al. [64] proposed a probabilistic model based on point process, in which a periodic kernel function is used to capture the user time period and multi-nominal distribution of locations.The whole framework is similar that of the RMTPP.Point processes are effective in modeling time-varying processes.To date, blending location prediction into the temporal point process is a newly emerging research direction.
In contrast with traditional models, time-dependent methods model time as an important factor in location prediction.Although prediction performance can be improved in this way, the correlation between spatial and temporal information is barely considered.

Representation-based methods
Traditional trajectory representation uses a point sequence, and some researchers have proposed new trajectory representation methods such as the extraction of trajectory features and representation based on deep learning.
Noulas et al. [65] regarded the location prediction problem as a ranking problem, whereby every checkin associated location and time is defined as a tuple l; t, ranking at the highest possible location in a user historical list of visited locations.Prediction features can be classified as user mobility, global mobility, or temporal features.
The user mobility feature includes historical visits and categorical preferences.The global mobility feature includes the popularities of geographic and rank distances.The temporal feature captures both the time of the location visit and the temporal patterns associated with significant locations.Using these features, Noulas et al. [65] proposed a rank model for predicting future locations.Location prediction ranking methods comprise supervised models that obtain the check-in data tuple position in the history list.This model extracts features from check-in datasets, thereby providing a new way to consider the location prediction problem.However, it ignores the spatiotemporal sequence and sparsity characteristic of check-in data.
In recent years, deep learning techniques have been developed to a great extent, enjoy wide application [66] , and typically yield good performance.To date, many researchers predict location using deep learning techniques [49,67,68] .For example, Liu et al. [49] have extended Recurrent Neural Networks (RNNs) to model time and location.Specifically, they used spatiotemporal RNNs (ST-RNNs) to predict location.19) is a representation of user u at time t , where S is a distance-specific transition matrix, q is a latent vector of the location of user visits, C is a recurrent connection, and h 0 is the initial status.The hidden layers in the ST-RNN integrate the effect of the geographic distances between locations and visiting history.ST-RNN models consider geographic distance in RNNs to predict location.Deep learning techniques are widely used in location prediction.Zhang et al. [67] proposed a deep learning method for predicting citywide crowd flows.First, the authors divided the city into grid cells, then they constructed an end-to-end deep spatiotemporal residual network, called the ST-ResNet, to predict the inflow and outflow of grid cells in a city.ST-ResNet consists of four major components that model the external, trend, period, and closeness influences of a grid cell.ST-ResNet determines their effects based on the inflow and outflow of grid cells in a city and then trains the network to predict crowd flows in a given location.
Representation-based methods explore new trajectory representations to predict location.However, finding a good representation to capture the characteristics trajectory data remains a challenge.

Semantic-based methods
The aforementioned approaches often focus on spatiotemporal space, however, few researchers have paid attention to the trajectory of semantic space.Semantic-based predictors enable better reasoning and therefore better location prediction results.Ying et al. [23] proposed a semantic framework for location prediction via semantic pattern mining, called SemanPredict.First, the authors extracted frequent locations in a user's movement history and the semantic information associated with these movements, and then they generated semantic trajectory patterns.SemanPredict then generates two tree structures to store these patterns.In the prediction process, the score of the tree structure path is computed to predict location.Semantic-based methods enable a better understanding of the semantic information associated with visits to locations.

Model categorization
Model categorization refers to the focus of an approach being a single movement object or a group.This categorization provides another perspective for distinguishing different location prediction algorithms.

Single-object models
The earliest location prediction studies investigated single object movement, like the Markov chain (Song et al. [43] , Ashbrook and Starner [51] ), spatiotemporal pattern mining (Bhargava et al. [58] , Lee et al. [19] , Monreale et al. [56] ), and ranking problems (Noulas et al. [69] ).These methods determine the location transition probability, user movement preference, and movement patterns with respect to single-object movement [70] .Although user future locations are mainly dependent on the user himself, other factors also play a vital role in location prediction, such as social information, group movement tendencies, and geographic distance, to mention a few, and these factors are not considered in single object spatiotemporal data.Also, singleobject models lack robustness and are sensitive to anomalous and sparse data.Since a single object can be characterized by random movement and usually produces sparse data, location transition probability and movement patterns cannot effectively model real individual movement patterns.

Group models
Unlike single-object models, group models primarily study a group of moving objects based on the hypothesis that both human beings and animals have social relationships.Group model researchers consider movement behavior to "follow the crowd" to some degree.Typical group methods are based on clustering (Ashbrook and Starner [51] ), matrix analysis (Wang et al. [47] ), and social relation analysis (Cho et al. [52] , Sadilek et al. [62] ).The goal of these methods is to extract group movement tendencies or patterns to enable location prediction.Cluster-based methods can be divided into two types: (1) determining user clusters to explore user social relations, and (2) identifying significant locations based on the data of all users.Both types determine the correlation between users and others.Matrix analysis techniques employ matrix factorization and collaborative filtering to obtain common patterns or relationships between users.Social relation information is direct data that reveals a user's social structure.The researcher builds a social graph to analyze the influence of social information on user movement.Group models often study external influences on users, such as social information, geographic distance, and even the effect of events.These additional types of information can enable improved location prediction performance.Unlike single-object models, group models can reveal the movement of a group of users in some scenarios, and have attracted interest by governmental organizations for optimizing traffic flow and city planning.

Hybrid models
Both single-object and group models have advantages and disadvantages.Some researchers have proposed hybrid models that integrate the two model types.For example, Wang et al. [47] modeled regularity and conformity terms, respectively, and added these two factors to predict location.Lian et al. [48] modeled the correlation between periodic user behaviour and novel exploration behaviour.Sadilek et al. [62] learned user's social information in social networks to enhance the performance of location prediction.Their model integrates a user's profile, activity (based on an analysis of a short message), temporal information, and location to build a predictor model.Generally, hybrid mainly modes have three main concerns: (1) developing singleobject and group models, (2) correlating or integrating the single-object and group models, and (3) using additional information to improve the performance of their prediction systems.Recently, hybrid models have become the keystone in location-predication research due to their excellent performance.

Summary
Table 4 provides an overview of some representative approaches with respect to the type of method used, model category, and the data source type.

Location Prediction Evaluations
In this section, we introduce some common trajectory datasets, including GPS, check-in, and Wi-Fi datasets.Next, we present some evaluation matrices for assessing location prediction.

Datasets
In recent years, there has been an increasing number of public trajectory datasets, including GPS, bike sharing, check-in, and Wi-Fi data.Zheng [11] and Bao et al. [73] have provided summaries of these data.We summarize and supplement these datasets in Table 5.
BrightKite check-in dataset is provided by a locaion based social network service provider.This dataset includes 4 491 143 check-ins from 58 228 users and their social relation data (4 491 143 relations) over the period from Apr. 2008 -Oct.2010 [52] .
Gowalla check-in dataset is provided by a location based social network website in which users share their locations by checking-in [52] .This dataset comprises a total of 6 442 890 check-ins from 196 591 users over the period from Feb. 2009 -Oct.2010.
Foursquare check-in datasets include three data sets: (1) Foursquare 1 is simply a check-in dataset with 12 000 000 check-ins from 679 000 users [74] ; (2) Foursquare 2 includes 221 128 check-ins from 49 062 users in New York City and 104 478 check-ins from 31 544 users in Los Angeles.As such, this dataset consists of both check-in information as well as social relation, and user and venue profiles [75] ; (3) Foursquare 3 is similar to the Foursqure 2 dataset with respect to the data types, and contains a total of 33 596 users [60] .
GeoLife trajectory dataset, collected by Microsoft Research Asia, includes 17 621 trajectories from 183 users' life trajectory datasets over a five-year period (from April 2007 to August 2012) [21,76] .This dataset is useful in analyses of user long-term movement patterns.
Bike sharing dataset contains 3 million trajectory records generated by 300 000 users and 400 000 bikes.For more details, readers are referred to the website (https://biendata.com/competition/mobike/).
Animal movement dataset contains the radio-  [43] p p P Ashbrook and Starner [51] p p A Lian et al. [48] p p A Xue et al. [71] p p P Cho et al. [52] p p A Lee et al. [57] p p P Monreale et al. [56] p p P Bhargava et al. [58] p p A Wang et al. [47] p p A Gao et al. [59] p p A Gao [60] p p A Du et al. [63] p p A Zarezade et al. [64] p p A Noulas et al. [65] p p A Liu et al. [49] p p A Zhang et al. [67] p p P Zhang et al. [68] p p P Ying et al. [23] p p P Ying et al. [72] p p P telemetry locations (along with other information) of elk, deer, and cattle from 1993 through 1996.For more the details, readers are referred to the website (https://www.movebank.org/).This dataset is useful in studies of animal behaviour in the wild.
Hurricane tracking dataset contains 1740 trajectories of Atlantic hurricanes over the 1851 to 2012 period, as provided by the U.S. National Hurricane Service (NHS) [19] .This dataset is used in meteorological and natural disaster prevention studies.

Evaluation matrices
Location prediction evaluation mainly focuses on precision, rank, and recall.Precision is based on the accuracy@k (top k locations) and Mean Average Precision (MAP), rank is determined based on the Normalized Discounted Cumulative Gain (NDCG) and Average Precision Rank (APR), recall is typically described by an F1 score.
Accuracy@k is an index for evaluating the top-k prediction locations whether or not they represent the real locations.
MAP is determined based on an information retrieval field.This value is the MAP, as defined in Eq. ( 20), where N is the number of locations and rel.r/ is a binary function related to the relevance of a given location rank.APR is the average percentile rank of a prediction for location l j [47] .PR is defined as shown in Eq. ( 21), where rank.lj / is the position of location l j and N is the number of locations.
NDCG is an index of information retrieval, which is used to evaluate the quality of a ranking.For more details, readers are referred to the book Introduction to Information Retrieval [77] .
F1 score is an index that considers precision and recall, as shown in Eq. (22), where 6 Applications and Future Work In this section, we discuss the applications of location prediction and future work.

Applications
Location prediction has wide application and location predictors play an important role in urban management.These models can help alleviate traffic congestion, improves urban governmental planning [78][79][80][81] , and help predict crime rates [82] .Location prediction can also help advertisers to promote sales based on location.LBSN based data mining, which builds a real relation between on-line social networks and real life [83][84][85] , is used to recommend friends or locations and detect anomalous check-in locations.Location prediction is currently in a period of rapid development, and various techniques are being used based on the applications at hand.

Future work
Location prediction has been the subject of study for many years.However, it continues to face many challenges and problems.Future location prediction systems may have two thrusts: (1) correlating spatial and temporal information, and (2) developing effective and efficient prediction systems.
Correlating spatial and temporal information is a key problem in trajectory data mining.Current methods typically consider the spatial and temporal spaces independently.New solutions must be found to solve this problem.Examples currently include the use of point processes and RNNs.Another approach for solving this problem is to identify new representations.
Effectiveness and efficiency are two key and ongoing factors in location prediction systems.The earliest methods were built using small datasets for the purposes of validating the methods.The complexity of the methods and limitations in computer resources constrain the use of prediction systems involving large-scale matrix factorization and sparse data in LBSNs.Effective and efficient location prediction systems are critical, as is the application of predictors to large volumes of spatiotemporal data.

Conclusion
In this article, we provided an overview of location prediction ranging from trajectory data preprocessing to forecasting location and the evaluation of location prediction systems.First, we introduced the basic concepts of location prediction, the different types of data sources, the challenges associated with location predictions and the location prediction framework.We introduced trajectory data preprocessing methods and then identified the classification of location prediction model types and discussed these models in detail.Next, we categorized location-prediction models as either single-object or group models or shared insights about these approaches.We also listed the available public datasets and evaluation methods to help readers conduct their own research.Lastly, we discussed locationprediction applications and future work.

Fig. 2
Fig.2Popular general framework of location prediction in which the integration of single object and group models is an emerging direction.

Fig. 4
Fig. 4 Influence of different features on novel check-in location behaviours, x-axes are features and y-axes are the conditional probabilities.(a) Distinct number of locations, (b) user entropy, (c) number of days, (d) novelty of check-in location, (e) location entropy, (f) visiting ratio, (g) hour of day, and (h) interval since previous check-in[48] .

Fig. 8
Fig. 8 Distribution of distances between (a) homes of friends, (b) all users, (c) 200 large cities, and (d) the probability of friendship as a function of distance[52] .

Fig. 9 (
Fig. 9 (a) Spatial model: geographic distribution of checkins when at home or work place.(b) Temporal model: temporal distribution of check-ins when at home or work place, denoted by red and blue lines, respectively[52] .

1 P
P OEx.t / D x D P OEx.t / D xjz u .t/ D OEz u .t/ D 1 C P OEx.t / D xjz u .t/ D 0 P OEz u .t/ D 0 (6) Distribution-based methods model geographical characteristics and temporal information as a probability distribution model to compute the probability of users arriving at a location.The difficulty in distribution-based methods is that if a user's movement distributions do not fit the assumption, performance suffers.4.1.3Pattern-based methods Trajectory pattern mining is another important branch of location-prediction research including methods based on sequential patterns F F D S F D W LocalF ri e nds S F D W LocalNon f riends D S FD W Di st antF ri e nds S F D W Di st antNon f riends

Fig.
Fig.14Influence of new check-in behaviour on geo-social correlation[59] .

Table 1
Summary of existing trajectory data cleaning methods.

Table 2
Main symbols used in location prediction models.
ra Trajectory st Sub-trajectory h History trajectory u User

Table 4 Summary
of existing location prediction algorithms.(Social relation based prediction algorithm: S-R; time dependent based prediction algorithm: T-D; active recording trajectory data: A; and passive recording trajectory data: P) Rzuizhi Wu et al.: Location Prediction on Trajectory Data: A Review