Extracting Campus’ Road Network from Walking GPS Trajectories

Road network extraction is vital to both vehicle navigation and road planning. Existing approaches focus on mining urban trunk roads from GPS trajectories of floating cars. However, path extraction, which plays an important role in earthquake relief and village tour, is always ignored. Addressing this issue, we propose a novel approach of extracting campus’ road network from walking GPS trajectories. It consists of data preprocessing and road centerline generation. The patrolling GPS trajectories, collected at Hunan University of Science and Technology, were used as the experimental data. The experimental evaluation results show that our approach is able to effectively and accurately extract both campus’ trunk roads and paths. The coverage rate is 96.21% while the error rate is 3.26%.

The rest of this paper is organized as follows. Section 2 states the related works and analyses their shortcoming. Section 3 shows the framework of our approach. Section 4 describes the detail of our approach. Section 5 shows the experimental results and evaluates it by using qualitative evaluation and quantitative evaluation. Section 6 concludes the paper and describes our further work.

Related Work
With the development of urban, taxis become more and more important traffic tools [18][19]. Due to the need for monitoring taxis [20], people acquire a large amount of taxi's GPS data, which can be used to generate a road network. At present, there is a lot of research has been carried out on road network extraction. According to the different methods of road network extraction, the existing methods are mainly divided into two categories: the first one is extracting road network based on images, and another is extracting road network based on GPS trajectories.

Extracting Road Network Based on Images
Liu et al. [7] and Jiang [21] use a neural network to train the data set of remote sensing images to extract the road. However, this method requires many computer resources to train the data set in the early stage of the experiment. Li et al. [22] improve the current popular D-LinkNet computing efficiency and increases the accuracy of road network extraction. Liu et al. [23] present a new method for extracting main roads from high-resolution grayscale imagery based on directional mathematical morphology and prior knowledge obtained from the Volunteered Geographic Information found in the OpenStreetMap. However, the method is not suitable for low-resolution images, or when the roads have different road structures (e.g., the road structures are dark and bright).

Extracting Road Network Based on GPS Trajectories
Gao et al. [24] detect road intersections by using a k-nearest neighbor classifier with a sliding window. Schroedl et al. [25] propose a method like the k-means algorithm. To obtain the geometric shape of the turning area, they use the least square method to fit the centerline of the road, and the spline curve to fit GPS trajectories in the intersection area. Xie et al. [26] identify intersections by segmenting GPS trajectories and infer the structure of the road network. Zhang et al. [27] propose a method to update the road network and enhance the accuracy of existing road network information by fusing large GPS trajectories. Zhang et al. [28] generate a directed graph representing an electronic map by incrementally merge GPS trajectories. Cao et al. [29] optimize the GPS trajectory by simulating gravity and repulsion, and gradually generated a curve graph representing the road. This method has good performance in the straight road section, but the algorithm is inefficient and time-consuming.
Although the approaches mentioned above can extract urban trunk roads, they ignored paths. Therefore, we propose a novel approach of extracting campus' road network from walking GPS trajectories. The approach can not only extract trunk roads but also paths. Our data are collected from the walking GPS trajectory at Hunan University of Science and Technology [17].

The Framework of Our Approach
According to the walking GPS trajectories, we propose a novel approach to extracting campus' road network. Fig. 1 shows the framework of our approach.
Data preprocessing is the basis for the next steps because the raw GPS data is often very large and uses up excessive storage space [30]. Firstly, two adjacent trajectories are judged, and the redundant one is eliminated. When a trajectory's speed is low or none, this trajectory named "stop point". And it needs to be eliminated. Finally, we use Gaussian filtering to smooth the remaining trajectories.
After data preprocessing, we use a clustering algorithm to merge some trajectories. Then, we segment cluster centers into several sets. The cluster centers in different sets can be fitted with curves. In the end, we apply a quasi-uniform B-spline curve to fit each segment's cluster center and further to generate a curve representing road centerline. Redundant Fig. 2a shows the raw trajectories before data preprocessing. Fig. 2b shows the trajectories after data preprocessing. Fig. 2c shows the GPS cluster centers. In Fig. 2d, the curves represent the roads extracted by our approach.

Data Preprocessing
When a GPS device is recording trajectories, the device may produce incorrect data due to the influence of random noise and error. These error data will directly affect the quality of road network extraction. Data preprocessing is the basis and premise of the next step. It contains three steps, redundant data elimination, stop point elimination, and trajectory smoothing.  (1) Redundant trajectory elimination. When a GPS device's signal is suddenly cut off, the device repeatedly records the previous position data. When an object with a GPS device is traveling at a low speed, the device repeatedly records the position data at a set frequency. A large amount of unnecessary and redundant trajectories will affect the efficiency of our approach, so they must be eliminated. We judge pi and pi+1, two adjacent trajectories. And if Di, the distance between them, is less than the threshold, so pi+1 is invalid, and it should be eliminated.
(2) Stop point elimination. If a moving object moves slowly at a certain position (or even pauses), a large amount of messy GPS data, named stop points, may appear around it. In this paper, DJ-Cluster algorithm of Zhou et al. [31] is used to eliminate stop points. We set the threshold for the radius and the threshold for the number of the trajectories. And we use the clustering algorithm to eliminate the trajectories in the high-density region. Fig. 3a shows the trajectories before stop point processing. Fig. 3b shows the trajectories after stop point processing.
(3) Trajectory smoothing. In Fig. 4, some of the green trajectories are not on the normal roads, which makes the road seem to break. Therefore, we use Gaussian filtering to eliminate some random errors to get the effect of smoothing the raw trajectories. Filter parameters are set to 5 times the sampling frequency of the GPS device to avoid affecting the normal trajectory [32]. Eq. (1) shows the equation of Gaussian filtering.

Road Centerline Generation
The phase of road centerline generation includes three steps, trajectories clustering, cluster center segmentation, and road centerline fitting. This phase can automatically generate the road centerline based on the preprocessed GPS trajectories.
(1) Trajectories clustering. Because GPS trajectories are distributing along the roads, we propose a continuously clustering algorithm. The idea of the algorithm is to cluster the trajectories along the direction of trajectory expansion. In this part, m represents the number of the trajectories, d represents the radius of clustering, both of m and d is the constraints of clustering. If the number of the trajectories in d neighborhood was greater than m, all the trajectories in d neighborhood are converted into a cluster center by the clustering algorithm. The cluster center's coordinate values are determined by all the trajectories in d neighborhood.
(2) Cluster center segmentation. The goal of cluster center segmentation is to generate several segments with different cluster centers. We segment the cluster center by the size of each cluster center's corner and the distance between two adjacent cluster centers. The size of the cluster center's corner reflects the degree of road change. If the corner of a cluster center is greater than the threshold, the cluster center is named an inflection point [33]. The inflection point effectively maintains the characteristics of the road network, and never loses the information of global road network.
(3) Road centerline fitting. We use a quasi-uniform B-spline curve to fit the cluster centers of each segments. The curve represents the road centerline. The quasi-uniform B-spline curve is superior to the Bspline curve because its first and last endpoints are on the curve. The quasi-uniform b-spline curve has two features: local validity and close to feature polygon. The formula of the quasi-uniform B-spline curve is shown in Eq. (2).
Algorithm 1 describes the steps of road centerline generation. In phase 1, we use the clustering algorithm to acquire cluster centers. In phase 2, we segment cluster centers into several sets. Each set contains different cluster centers. In phase 3, we calculate the control vertices of the quasi-uniform B-spline curves, and gain the road centerline by using the curves to fit the cluster centers in different sets. The details are as follows.

Experimental Results and Analysis
We experiment with the approach by using walking GPS data from the Hunan University of Science and Technology [17]. The GPS data includes teaching region, administrative office region, faculty housing estate region, and student dormitory region. The total area of these regions is 2 million square meters. The data collection frequency is once per second. The experiment is conduct on the machine equipped with Intel(R) Core (TM) i5-6500@3.2 GHz processor and 8 GB of memory.
We analyze the spatial distribution characteristics of the trajectories where the south campus of Hunan University of Science and Technology, and use them to build the road network. Then, we use the vectorization tools provided by ArcGIS to convert the experimental results into a vector file. After the file was generated, we subjected it to coordinate conversion and registration, and generated a uniform vectorized road network.
According to the Baidu maps and other information, we use both qualitative evaluation and quantitative evaluation to evaluate the performance of the experimental results.
(1) Qualitative evaluation, one of the evaluation ways, is comparing our experimental results with existing road network by overlaying layer. Fig. 5a shows the comparison between the experimental results and the Google maps. Fig. 5b shows the comparison between the experimental results and the Baidu maps. In Fig. 5b, the red line indicates the road network of the Baidu maps; the green line indicates the old roads; the purple line indicates the newly extracted paths; the orange line indicates the newly extracted trunk roads. Roughly, the newly extracted road network is the same as the existing road network. However, the road information in purple and orange parts are the missing part of the existing road network. From the field survey we can know that the reason for the lack of some roads in the Baidu map is that the existing road network has not been updated in time. The results show that our approach can extract not only roads but also paths, which is an advantage that the extraction approach of road network based on vehicle GPS does not have.  (2) The quantitative evaluation uses the method of math to evaluate a certain object. In this part, α represents the coverage rate, and β represents the error rate. We should first obtain the road network matching the Baidu road network, and use both α and β to evaluate the performance of our experimental results [34]. The formula to calculate the road coverage rate is shown in Eq. (3). And the formula to calculate the error rate is shown in Eq. (4).
In Eq. (3) and Eq. (4), ϕ represents the set of the Baidu road network, ϕ = {P1,P2,…, Pk}; ψ represents the set of our experiment results; σ represents the set matching the experimental results and the Baidu road network; len(P) represents the length of the experimental results.
We take the road network of the Baidu maps as the buffer analysis object [24]. Then, we extract the road network set matching our experimental results and the Baidu road network from each buffer radius. According to Wang et al. [35], the corresponding spacing between OpenStreetMap (OSM) and the actual road is mainly distributed between -10 meters and 10 meters. And it can be considered that the buffer radius threshold adopted in this paper is within a reasonable range. In research regions, the length of the Baidu map is 4,992.24 meters, while the length of our experimental results is 6,410.83 meters. Among the experimental results, the length of trunk roads is 839.86 meters, and the length of paths is 338.87 meters.
In Tab. 1, when the buffer radius is 10 meters, the coverage rate and the error rate of our experimental results are respectively 98.73% and 18.84%. As shown in Fig. 5b, the main reason for the high error rate of our approach is that our approach identifies roads that are not updated in time in the Baidu maps. If we ignore the roads that have not been updated in time in the Baidu maps, we find that the error rate is significantly reduced, from 18.84% to 3.26%. It indirectly proves that our approach can effectively update and correct the existing road network.

Conclusions
With the maturity of GPS technology, extracting road network from GPS trajectories become increasingly prevalent. Existing approaches focus on mining urban trunk roads. Path extraction, however, is always ignored. Path extraction is valuable in earthquake relief and village tour.
Therefore, we propose a novel approach of extracting campus' road network from walking GPS trajectories. It includes two parts, data preprocessing and road centerline generation. Data preprocessing consists of three steps, redundant data elimination, stop point elimination, and trajectory smoothing. Road centerline generation also contains three steps, trajectory clustering, cluster center segmentation, and road centerline fitting. We use the GPS trajectories collected at Hunan University of Science and Technology to extract the campus' road network, and use qualitative evaluation and quantitative evaluation to evaluate our experimental results. The experimental results show the coverage rate of the road network is 96.21%, while the error rate is only 3.26%. Besides, our approach can be applied for road network updating because it is effective and efficient to extract both trunk roads and paths.
Road intersection identification also plays an important role in vehicle navigation and road planning. In our further work, we should not only to extract more accurate road network but also to precisely identify road intersections.