A Novel Spatio–Temporal Deep Learning Vehicle Turns Detection Scheme Using GPS-Only Data

Whether the computer is driving your car or you are, advanced driver assistance systems (ADAS) come into play on all levels, from weather monitoring to safety. These modern-day ADASs use various assisting tools for drivers to keep the journey safe; these sophisticated tools provide early signals of numerous events, such as road conditions, emerging traffic scenarios, and weather warnings. Many urban applications, such as car-sharing and logistics, rely on accurate and up-to-date road map data. Map generation methods use a variety of data sources, including but not limited to global positioning systems (GPS). In this research we propose a GPS-only data trajectory analysis and a novel scheme to convert GPS trajectory data to image-based data to train a custom Convolutional Neural Network (CNN) model. The empirical results with an extensive 5-fold cross-validation show that the proposed scheme identifies turn and not turn with more than 94% recall. It outperforms the existing turn detection schemes on two major frontiers, the required data and the accuracy achieved in detecting different driving behaviors.


I. INTRODUCTION
Vehicle driver behavior entails multiple aspects of a driver's trip from one place to another. The measurable driver behavior contributes to various parts of Advanced Driver Assistance Systems (ADASs) to provide essential information for decision-making. Particularly driver trajectory analysis imparts solutions to numerous problems, including but not limited to map construction, business affair analysis, and driver behavior analysis. A vehicle trajectory is an ordered sequence of timestamps and sampled locations along a moving path. With the help of location providers such as Global Positioning System (GPS), it has become possible to collect vehicle trajectory information. The vehicle trajectory analysis has been previously used for various purposes, including but not limited to anomaly detection [1], The associate editor coordinating the review of this manuscript and approving it for publication was Venkata Ratnam Devanaboyina .
Fu et al. [1] proposed a two-layer hierarchical clustering analysis that enables the identification of dominant pathways and lanes. The clustering findings also allow for the detection of unique pathways. Yu et al. [5], a unique approach of trajectory description is suggested to construct a semantic model for the automated identification of traffic infraction occurrences. Rahim et al. [7] presents a driver identification system using minimal data. The approach utilizes global positioning system (GPS) data to determine an individual's driving style. Rahim et al. [6] presents an event-driven system that uses GPS as its only data source. Quehl et al. [8] proposed a method for predicting vehicle trajectories based on maps that contain information about the behavior of traffic participants in a specific region. Wu et al. [9] proposed a VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ probabilistic model-based method for identifying trajectory outliers. Guo et al. [10] proposed a method for predicting the business affairs of a traveler based on the trajectory analysis. To assess vehicle behaviors, they focused on the segmentation of GPS trajectory data collected in logistics transportation. Guo et al. [11] used graph theory to analyze vehicle trajectories and extract various patterns through the movement of vehicles. They presented a graph-based method for analyzing trajectories as a complicated network. Using efficient and precise methods to compare trajectories is a crucial aspect of trajectory analysis. Sousa et al. [12] provided a summary of vehicle trajectory data, including models and preparation methods. Wang et al. [13] compared the performance of six widely-used trajectory similarity metrics using an a vehicle trajectory dataset. Magdy et al. [14] proposed a comparative analysis of frequently employed trajectory similarity metrics, highlighting their merits and shortcomings. Zheng [15] discussed multiple data mining techniques used for vehicle trajectory analysis. This survey investigated the relationships, correlations, and distinctions between current approaches. It also described the methods for transforming trajectories into various data types, including graphs, matrices, and tensors. Schlechtriemen et al. [16] proposed method for identifying traffic participants' intentions to shift lanes. In a largescale experiment using real-world data, they trained and validated the intention prediction model using more than 160 samples. They achieved an average accuracy of 93% for lane-change detection. Mandalia et al. [17] presented a support vector machine (SVM)-based method for estimating driving intentions. SVMs were particularly successful at spotting driver lane changes early on. They achieved an average accuracy of 89.22% for 5-second non-overlapping window. Ramanishka et al. [18], of which the dataset is used in this research, proposed a computer vision based scene annotation tool to facilitate research on understanding driver behaviour from untrimmed data sequences. A 98% accuracy is achieved, while it requires extensive hardware to run proposed machine learning model and application. Mao et al. [19] discussed measures to check similarities of trajectories in an urban environment with the effect of different transforms. According to the study, using trajectory data processing techniques, it is possible to mine the patterns of human activities and vehicle movement. This research presented a segment-based dynamic temporal warping technique for mining trajectory data. Wang et al. [20] used computer vision to analyze and differentiate between pedestrian and vehicle trajectories. They constructed semantic scene models based on long-term observations of the scene's moving items. The trajectories of automobiles and people are clustered initially, followed by clustering based on spatial and velocity distributions. Wei et al. [21] analyzed left-turn behaviors of the vehicle to provide better guiding lanes at left-turns at intersections. They presented a technique for detecting irregularities in a vehicle's approach by analyzing its velocity patterns. Chan [22] studied human behavior in traffic Append j to the win array 7: end for 8: Append win array as a column to the G 9: lab p = {null: for j >= 1 and j < c w } 10: W = {c w zero matrices each sized H p x W p } 11: for all i in range 1 to c w do 12: G i = {G j : for j < k and j.win = i} 13: end for 18: if P 95 (m i ) P 10 (m i ) < 0 then lab p i = 'Not Turn' 19: else lab p i = 'Turn' 20: end if 21: for all g j in G i do 22: x j , y j = Convert Geo Coordinate to Cartesian Coordinates g j .lat j , g j .lng  30: end for 31: return W , lab p through trajectory analysis and discussed criteria for safety warnings at intersections. Nowosielski and Forczmański [23] proposed a vision-based approach for the analysis of different vehicle trajectories. Santhosh et al. [24] surveyed different vision-based approaches for abnormal behavior detection in traffic. Ahmed et al. [25] investigate various trajectory and vision-based systems for various surveillance applications. Lin et al. [26] used graph positioning with a vision-based approach to analyze the trajectory of different vehicles, they utilized the graph feature positioning for vehicles to detect and analyze trajectories of the vehicles from video frames. Bian et al. [27] surveyed different algorithms for trajectory analysis, including various contemporary machine learning algorithms. Sekh et al. [28] proposed an unsupervised trajectory clustering method to perform a ranking of trajectories to identify abnormal behaviors through vision-based object tracking. Kan et al. [29] performed traffic congestion analysis using the GPS data and road network data. Tang et al. [30] proposed a method to analyze the GPS trajectories and construct the intersection of roads through the trajectory analysis. Deo and Trivedi [31] classified trajectories into multiple maneuvering events using a proposed neural network model for this application. Salvo et al. [32] used unmanned aerial vehicle to capture traffic and perform an analysis of macrolevel traffic flow using trajectories analyzed with computer vision.
This research aims to provide a deep learning-based solution for turn detection for driver assistance systems using trajectory analysis. This research work 1) Utilizes GPS-only data for trajectory data to detect turns in naturalistic driving. 2) Creates spatio-temporal windows of GPS trajectory allowing the trajectory data to apply convolutional deep learning for improved turn detection. A spatio-temporal window consists of spatial references showing the trajectory of the vehicle moving within that window. Whereas, the temporal part is reflected by how much time the vehicle stayed on a certain point within that trajectory within a window. 3) Employs CNN model to classify turns and non-turns without any image-based data. 4) Performs significantly well compared to different research works which utilize extensive data or require much more sophisticated hardware. This paper is further divided into sections, Spatio-temporal Vehicle Turn Detection section describes GPS-only turn detection and proposes algorithms for conversion of the GPS data to spatial window images and labeling of the windows. CNN-based Classification Model section applies deep learning modeling for the classification of GPS-based windows into turn and non-turn. Empirical Analysis and Results section discusses an empirical study in detail and presents an analysis. Figure 1 illustrates the overall stages of the proposed research work in graphical form.

II. SPATIO-TEMPORAL VEHICLE TURN DETECTION
GPS data-only turn detection is proposed in algorithm 1, where the algorithm takes chronologically ordered GPS data G as input wherein each tuple consists of four basic data elements t time, lat latitude, lng longitude, and ori orientation. The elements are indexed in subscripts such as i th tuple's orientation is denoted as ori i . The rest of the inputs include spatial window size size in feet, and the size of the output matrix having height H p and width W p . The proposed model uses spatial windows to detect turns through GPSonly data. For which the method createSpatialWindows is expressed in algortihm 2. Each spatial window is analyzed for a specified pattern of orientation ori. It uses getBoundingBox method to create a spatial fence on location for a given size. Whereas it tests whether a geographical location in a certain fence or not using inBoundingBox. It uses curve fitting as an additional measure to ensure reducing the number of false non-turns. Where a straight line is fit on the available cartesian coordinates converted from latitude and longitude available in G i . Changes between recreation of the fitted straight line and trajectory segment of a certain window are compared to judge, whether the label needs updating or is correctly labeled. The constant c change is used to validate the label at this stage. TrajectoryImageGeneration method used for a trajectory to matrix conversion in algorithm 1 is adapted from [33], which takes GPS trajectory, window sizes as input and returns a H p x W p -sized matrix representing trajectory in matrix form. The appendColumn method is a trivial method to append a data column to a relation, it takes two inputs, the name of the column and column data in form of a sequence. The getSpatialWindow method retrieves the identification number of a spatial window to which the given location belongs. The ground truth data is visualized in figure 2, these spatial windows form the ground truth when combined with the truth labels, these spatial windows were created using algorithm 2. The spatial windows as shown in this figure, show the trajectory of a vehicle in that particular window, whereas the higher values at certain points seen as bright yellow show that the vehicle stayed longer in the particular part, compared to the rest of the window. The labels assigned through algorithm 1 are then validated with the ground truth, as discussed in subsection Dataset.   (1) Natural images contain rich texture and color information and consist of three channels: Red, Blue, and Green, while trajectory images significantly lack texture and color information. (2) Trajectory images, due to lack of texture and color information, suffer from interclass similarities due to which different trajectory images may appear similar. Due to the aforementioned reasons, it is hard for existing deep learning models to draw decision boundaries and accurately classify images in desired categories. Deep learning models enjoyed tremendous success in various object classification, detection, and segmentation tasks in natural RGB images, however, their performance degrade when these models are directly employed to classify trajectory images. This may attribute to the fact that the receptive fields of these models are small due to which these models can not extract the rich contextual information required to classify trajectory images. This is due to the reason that we propose a shallow and effective deep learning model for the classification of trajectory images.
The detailed architecture of the proposed deep learning model is shown in Table 1. The proposed deep learning model follows a similar pipeline in [34]. Generally, the proposed deep learning network consists of a stack of six convolutional and pooling layers followed by two fully connected layers. Convolutional layers perform important operations and extract important information about lowlevel features, for example, edges, color, and texture as well as high-level contextual information from the input image. Generally, each convolutional layer consists of the convolutional kernel with a fixed size that convolves over an input feature map. A convolutional layer is always followed by ReLU non-linear activation function and batch normalization function. Let K ∈ R 3×K x ×K y is an image that is provided as an input to the stack of convolutional layers. In this case, 3 represents three channels of the input image, K x , and K y represents the width and height of the image. Let the output feature map M ∈ R c×w×h , where c is the number of channels, w and h represents the width and height of output feature map. After a stack of convolutional layers, we employ pooling layers that reduce the size of the feature map by reducing the number of parameters. We employ a Max-pooling layer (with a window of fixed size) to the output of convolutional layers. Max-pooling layer chooses the maximum value from the window that represents the most prominent feature in that particular window. Another advantage of using the Max-pooling layer is that it prevents the network from the over-fitting problem. The feature map generated after passing the input image through the stack of convolutional and pooling layers is finally provided as input to the fully connected layers. Fully connected layers take high dimension input feature vector F v and output the classification score vector C v that contains the scores of different classes for input feature vector F v . For training the network, we modify the size of the input training images in order to make them compatible with the input size of the network. Since our training data are binary images, the network accepts 3-channel images. We transform a single channel binary image to a 3-channel image by copying a single-channel image three times as in [35].

A. DATASET
Honda Research Institute (HRI) published the HRI Driving Dataset (HDD) [18], the dataset contains a large amount of data, from various sensors and post-processing, focusing on computer vision. This dataset provides labels for different driving actions, labels are provided with the video session and sensory information including Global Positioning Service (GPS) data. The data used in this research consists of different variables including GPS latitude, longitude, orientation, and label of the event. The labels for the driving events considered in this study are left turn and right turn labels out of twelve different labels available in the dataset. The data used in the experiments consists of 43 hours of human vehicle driving. For this data, the labels are mapped to the GPS data and the labeled GPS data provides the ground truth information for the turn and non-turn classes, for its use in the experiments.

B. MODEL PERFORMANCE
A set of statistical analyses provides an overview of the performance of the proposed model. These analyses are based on a set of fundamental measures including true positive, true negative, false positive, and false negative. True positive is a measure where the decision and actual both belong to the same class as the turn sample was identified as turn by the model, it is denoted as TP. Whereas, the true negative is a classification of the negative class as negative, represented as TN . False-positive and false-negative measures are the labels that do not represent the class as in the ground truth. Falsepositive FP is where it is incorrectly labeled as positive while false negative is where it is labeled as negative but ground truth is positive.
Receiver operating characteristic (ROC) analysis is used to elaborate the results. ROC analysis indicators used in testing the performance of this research are expressed as follows.
An approximate total of 4500 100-feet spatial windows are used in the experiment to test the proposed approach for turn detection using GPS-only data. Where a 100-feet spatial window covers approximately 4 × 10 4 square feet area as 100 feet size represents the perpendicular distance from the center to any side. As discussed earlier, the labels for the generated windows are available in form of ground truth, where each GPS record is labeled with a label, any label besides turn is considered as non-turn. Using algorithm 1, a set of predicted labels are detected for all windows. The labeled windows are then used in the experiment for automatic labeling of windows based on their appearances. A CNN model is trained and tested using a 5-fold validation approach. The labeled windows are divided into two classes Turn and NotTurn. For each class we select 500 samples randomly to ensure a balanced dataset. In each fold, 20% of data is split for testing purposes, and 80% of data is separated for training. The fold-by-fold precision and recall are listed in table 2. It is observed that the trained model classifies 94% of the training samples with significantly high precision. Compared to studies [2], [16], [17], [18], proposed work outperforms classification of different driving behaviors, in terms of accuracy or data requirements. Figure 3 reflects higher recall for each of the classes in different cross validation folds.
The study [2] detects lane change with accuracy as low as 58% and high as 97% with various classification algorithms. It must be noted that this study uses Magnetometer, Accelerometer, and Gyroscope sensors besides GPS for data collection. The study [16] achieved a recall of 73% and 77% to detect lane change with a two-fold cross-validation approach with a camera and several radar sensors. The study [17] achieved a recall of 87% for lane-change detection using steering angle and speed vectors. The study [18] used a camera and numerous sensors to collect data and perform behavior annotations on the video. The difference between annotations through their model versus a human annotator was 2%. The cost of such systems for real-time application will be large.

IV. CONCLUSION
Turn detection is a crucial application for the modern ADAS in vehicles. For this application, various models are available in the community, that either use computer vision or multiple data sources. That makes those either computationally or financially cost expensive. This research work proposes a deep learning-based turn detection model for GPS-only data. In this research, we propose spatial window-based algorithms to transform GPS-only data into image data for use in the deep learning process. A deep learning model is proposed for the classification of turns and non-turns and is tested on a realtime driving dataset. Five-fold testing of the deep learning classifier shows that the proposed work achieves up to 94% accuracy. Comparing the proposed research to other works that use enormous data or far more powerful gear, it performs noticeably better. SULTAN  MUHAMMAD RASHID received the Ph.D. degree in computer science from the National University of Computer and Emerging Sciences. He has over ten years of academic experience during which he has been the head of various departments and units for over eight years. He is currently working as the Head of the Department of Computer Science, National University of Technology (NUTECH), Pakistan. His research interests include virtual reality, artificial intelligence, and applied machine learning.
RAFI ULLAH received the M.S. degree in computer system engineering from GIKI, Pakistan, in 2006, and the Ph.D. degree from PIEAS, Pakistan, in 2010. He is currently working as an Associate Professor with the Department of Computer Science, National University of Technology, Pakistan. His research interests include image and video processing, digital watermarking, and EEG signals.
HANAN TARIQ (Graduate Student Member, IEEE) received the B.S. degree in electrical (power) engineering from COMSATS University Islamabad, Abbottabad Campus, Pakistan, in 2016, and the M.Sc. degree in electrical engineering from the Imperial College of Business Studies (ICBS), Lahore, Pakistan, in 2018. He is currently a Ph.D. Scholar with the Department of Electrical Power Engineering, Gdańsk University of Technology, Poland. He has been associated with research and teaching at multiple universities, since 2017. His research interests include electrical safety, power system stability, and dynamics and simulative studies of power systems.
STANISLAW CZAPP (Member, IEEE) received the degree from the Gdańsk University of Technology, Poland, in 1996, and the Ph.D. and D.Sc. degrees, in 2002 and 2010, respectively. He is currently an Associate Professor with the Faculty of Electrical and Control Engineering, Gdańsk University of Technology. He is the author and coauthor of many articles, conference papers, and unpublished studies, such as designs and expert evaluations as well as opinions. His research interests include power systems, electrical installations and devices, electric lighting, and electrical safety. He is an Expert of the SEP Association of Polish Electrical Engineers in Section 08 Electrical Installations and Devices. VOLUME 11, 2023