A Method for Detecting Significant Places from GPS Trajectory Data

—Detecting significant places are necessary for learning patterns of human behavior. Moreover, the Global Positioning System is the high accurate estimation of positioning method for mobile tracking. In this paper, we propose a method for detecting significant places based on GPS data. It is difficult to determine significant places relying on the clusters from using distance and time thresholds because of the difference of noises. Therefore, we introduce a method based on EIDBSCAN algorithm to detect arbitrary shape clusters with different densities, namely MKEIDBSCAN. We also propose the way to estimate input parameters of density-based clustering algorithms including the radius and the minimum number of points. As a result of using our proposal, significant places are detected more accurately and the running time is reduced. 


I. INTRODUCTION
Nowadays, many of popular mobile applications and services are using the exact location or location information such as 'Glympse' application for sharing your location, 'Locale' application for your changed behavior in the different locations, 'Where' for location recommendation, 'NeverLate' for changing time to go to the other location and so on.Furthermore, smartphones that embed many sensors allow identifying the user's coordinates based on GPS, Wi-Fi with high accurate estimation of positioning method.
Conceptually, a significant place is a region where moving objects pause or wait or slow in order to complete important activities such as home, places of work, restaurants, shops, etc. GPS enabled device can record user's location as time stamped sequences of latitude and longitude coordinates.Therefore, the significant places are formed from GPS points.
There are many methods to detect the significant places based on stay points or density of GPS points.Stay points are detected in [1]- [5] by extracting consecutive GPS points from trajectories with a satisfaction of stay time and distance thresholds.These stay points are then grouped into Region of Interest (RoI) using density-based clustering.The focus of these methods is to detect places where the user stays during a certain time period.Normally, human has a tendency to stay at significant places.Therefore, the density of GPS points in these regions is denser than other regions.A method based on density is revealed in [6].Data space is divided into grid and RoIs are groups of nearby dense regions without considering stay time.
Our contributions: In this paper, we propose the method of detecting the significant places using densitybased clustering algorithm, namely MKEIDBSCAN algorithm.First of all, we present an effective way to estimate two input parameters in EIDBSCAN including the fixed radius value ε and the varied number of point MinPts in a circle with that radius value, using the proposed MinPts-value method.Moreover, we use Kmedoid algorithm to specify elite points.These elite points are also using as initial points for formed new clusters in MKEIDBSCAN.MKEIDBSCAN algorithm is based on EIDBSCAN with multi density to discover the clusters.This algorithm uses seeds which are the closest points to eight Marked Boundary Objects like IDBSCAN algorithm.One circle is divided into 8 regions.Seeds in the three consecutive regions are considered for keeping or deleting.Thus, the processing time will be reduced.To determine the significant places, we extract cluster information.If this clusters' information satisfies to comparison thresholds, they are considered as significant places.
The rest of this paper is organized as follows.We start with a discussion of related work in Section II.Some basic concepts are defined in section III.Details of proposed algorithm for estimating the input parameters and detecting the significant places are presented in section IV.The experimental results and comparisons are described in section V. Finally, section VI presents a conclusion.

II. RELATED WORK
A method for detecting the significant place is shown in papers [1]- [4].They use Geolife dataset.Time and distance thresholds are considered as two scale parameters.If a consecutive GPS point sequence satisfies two thresholds, a stay point is formed.There are two situations for bad detections.They are the loosed satellite signal case and user's moving out of certain geospatial range for a period case.And then a density-based clustering algorithm, OPTICS, applied to compute RoIs from stay points.The RoI construction algorithm is applied by considering the density value which is the number of GPS trajectories into a grid in the paper [6].RoI is a combination of nearby dense cells on condition that the average density of the region is higher than a threshold.Paper [5] shows two drawbacks of this algorithm.There are RoI determinations without considering the stay time and the density estimation including moving points in the cell.After detecting stay points, the authors of paper [5] use Local Outlier Factor to extend them into RoI.They also remove a certain percentage of stay points with the largest LOF values to get regions clearly.
In general, in a significant place, people either tend to move slowly or don't move.Thus, the density distribution in these places is much denser than others.Otherwise, these locations' sharp is different.Those are reasons why multi density clustering algorithms are considered in the papers [7]- [9].There are two problems including the estimation of two input parameters and the processing time for clustering based on multi density method.For estimating two input parameters, papers [8], [9] use k-dist method.Firstly, they calculate the average distance from a point to K-closest neighbors.Next, the average distances are sorted in the ascending order.Then they plot them and choose changed sharps.The average distance values at changed points are considered input radius matrices.Finally, they will calculate the average number of points in a circle with any point center and a radius in radius matrices respectively.For reducing the running time, IDBSCAN [10], KIDBSCAN [11] and EIDBSCAN [12] are revealed.Most of them consider a seed which is the closest distance to eight MBOs to expand the cluster.KIDBSCAN use K-mean algorithm to find elite points for initializing new clusters.EIDBSCAN divide a circle into 8 regions and they consider seeds in the three consecutive regions.If these considered seeds exist, the seeds in the second region will be deleted.

III. BASIC CONCEPTS
Basic concepts are defined as follows: • The neighborhood within a radius of a circle ε of a given object is considered as the ε-neighborhood of the object.• The object is a core point, if the ε-neighborhood of an object contains at least the minimum number of point within the circle ε-radius, MinPts object.• The points in the ε-neighborhood of a core point are considered as border points.• A noise point is any point without a core point or a border point.• An object p is directly density-reachable from object q if p is within the ε-neighborhood of q and q is a core object.• An object p is density-reachable from object q, if there is a chain of object where and such that is directly density reachable from.• An object p is density-connected to q with respect to ε and MinPts if there is an object o such that both p and q are density reachable from o.
• A density-based cluster is a set of densityconnected objects that is maximal with respect to density-reachability.This is illustrated in Fig. 1. • MBOs: the eight distinct points are considered as marked boundary objects.Assuming that the core point is P (0, 0), the eight marked objects may be defined as in Fig. 2. • The closest points from MBOs in the ɛ-radius circle are considered as seeds.IV.METHODOLOGY

A. Parameter Estimation
The ε-radius and MinPts values are estimated.

1) Estimate the ε-radius value
The density is the number of points within a region.Some specific cases have the same density with different couple of the radius value and MinPts.To consider the different densities, we fix the radius value and change the MinPts value.We consider the bandwidth in the density estimation method as the diameter of circle.In the paper [13], [14] show that the density of human movement in the urban is Gaussian distribution.Thus, the optimal bandwidth value in [15] is assigned to the ε value and calculated by equation: where n is the number of GPS records of a user and x  , y  are the standard deviations of the whole GPS sequence in two dimensions, respectively.

2) Estimate MinPts
To estimate MinPts, we calculate the number of points within a ɛ-radius and a center point of the elite points.Then, the changes of densities are detected.The method is illustrated by following the below steps: Step 1: Using K-medoid to find the coordinate of K elite points (K-center points) in input data.
Step 2: Calculate the number of points (Minpts-value) in a circle with fixed radius and K elite points.
Step 3: plot sorted Minpt-values in ascending order.
Step 4: Finding sharp change corresponding with suitable value of Minpts for each density level.
For example, Fig. 3 shows the ascending order number of points MinPts-value in a circle with ε radius and elite point coordinate.The point A scans all MinPts-value points to find shape changes considered as MinPts.There are three relatively smooth lines which describe three density levels.Line f shows the densest density, line b shows the sparsest density and line a shows the MinPtsvalue of outlines.Take line a and b as a sub-MinPts-value plot to select MinPts1, line c and d as a sub-MinPts-value plot to select MinPts2, line e and f as a sub-MinPts-value plot to select MinPts3.

B. MKEIDBSCAN Algorithm
The clustering method helps for detecting the significant places.In the reality, the significant places have arbitrary sharps.If we consider the GPS point densities within a region, they are different densities depending on the occurrence frequency and time stay.For this reason, GPS point cluster method with multi densities is brought forward.Moreover, because the number GPS points are large, the processing time should be necessarily considered.MKEIDBSCAN algorithm is proposed to solve these problems.Firstly, this algorithm detects initial points via using elite points which are center points in K-medoid algorithm.Moreover, these elites are used to estimate input parameters illustrating in section IV.Secondly, seeds also identify to expand the regions with core points of elite points.It is similar with EIDBSCAN algorithm for reducing the number of expansion seeds if three consecutive regions are existed.Therefore, the running time is reduced significantly.
MKEIDBSCAN is implemented by steps as follows: Step 1: Discovery K elite points based on apply Kmediod algorithm.K value is set by a simple rule of thumb.
Step 2: Determine the input parameter relied on equation 1 and above MinPts-value method.
Step 3: Sort MinPts in descending order and implement cluster with each MinPts and radius ε in succession.
Step 4: Cluster GPS point by applying EIDBSCAN, but the initial points for forming new cluster is the K elite point.

C. Significant Place Decision
This is final step to determine clusters into significant places.This is illustrated in the Fig. 4. All of points in a trajectory are scanned.A cluster is a significant place when three factors including average velocity stay time and cluster radius are satisfied simultaneously.We extract the movement information of the user in each cluster.If an average velocity of the consecutive points in a cluster in a trajectory is less than the velocity threshold, a stay time is longer than the stay time threshold and a cluster radius smaller than the radius threshold, the cluster is considered as the significant place.Different GPS collector information is contained in many trajectories.GPS information is record at every 1~5 seconds or every 5~10 meters.Moreover, the user transportation modes were collected such walk, bike, bus and car, etc.Every user's GPS log file stores many trajectories which named by their starting time.A trajectory is formatted as follows: Line 1…6 are useless in the dataset, and can be ignored.Points are recorded in following line.
Field 1: Latitude in decimal degrees

B. Experiment Results
For achieving the clustering results, we combine the input parameter estimation and the reducing running time method.We also apply k-dist value and Minpts-value to estimate the input parameters of KIDBSCAN, EIDBSCAN and MKEIDBSCAN algorithms.Table I shows the clustering results using different algorithms.If the input parameters are different, the number of classes is also different.The priority of classes depends on their different density.Therefore, the number of classes of MKIDBSCAN is more than in other algorithms.Because of using k elite points to make new clusters and seeds to expand the cluster, the running time of MKIDBSCAN is reduced significantly.After clustering, we extract average velocity, stay time and radius of cluster.Specially, stay time values are time intervals of consecutive time sequences.Then, they are compared to thresholds to determine the significant locations.In my paper, I setup these thresholds in Table II.In Fig. 5, the intersections, places in which the GPS signal is lost or the user wanders, are detected as stay points and then they are clustered into interesting locations.Using the MKEIDBSCAN algorithm, these incorrect interesting locations will be removed.This is illustrated in Fig. 6.In this paper, we introduce an algorithm to detect the significant places using GPS traces with Geolife dataset.GPS points are grouped into clusters and noise.The input parameters are estimated based on suitable calculated methods.Then, some specific clusters are considered as significant places if they satisfy threshold conditions simultaneously.These conditions include the velocity threshold of 1.5 m/s, stay time threshold of 20 minutes and cluster radius threshold of 200 meters.We found that MKEIDBSCAN is an effective algorithm compared to other algorithms to detect the significant places.

Figure 4 .
Figure 4.A comparison to determine the significant place

TABLE I .
THE CLUSTERING RESULTS USING DIFFERENT COMBINED ALGORITHM

TABLE II .
THE THRESHOLD VALUES FOR COMPARING TO CLUSTER INFORMATION