A NALYSIS OF SPATIOTEMPORAL DATA TO PREDICT TRAFFIC CONDITIONS AIMING AT A SMART NAVIGATION SYSTEM FOR SUSTAINABLE URBAN MOBILITY

: Urban traffic congestion created by unsustainable transport systems and considered as a crucial problem for the urbanised areas provoking air pollution, heavy economic losses due to the time and fuel wasted and social inequity. The mitigation of this problem can improve efficiency, connectivity, accessibility, safety and quality of life, which are crucial parameters of sustainable urban mobility. Encouraging sustainable urban mobility through smart solutions is essential to make the cities more liveable, sustainable and smarter. In this context, this research aims to use spatiotemporal data that taxi vehicles adequately provide, to develop an intelligent system able to predict traffic conditions and provide navigation based on these predictions. GPS (Global Positioning System) data from taxi are analysed for the case of Thessaloniki city. Trough data mining and map-matching process, the most appropriate data are selected for travel time calculations and predictions. Several algorithms are investigated to find the optimum for traffic states prediction for the specific case study concluding that ANN (Artificial Neural Networks) outperforms. Then, a new road network map is created by producing spatiotemporal models for every road segment under investigation through a linear regression implementation. Moreover, the possibility to predict vehicle emissions from travel times is investigated. Finally, an application with a graphical user interface is developed, that navigates the users with the criteria of the shortest path in terms of trip length, travel time shortest path and “eco” path. The outcome of this research is an essential tool for drivers to avoid congestion spots saving time and fuel, for stakeholders to reveal the problematic of the road network that needs amendments and for emergency vehicles to arrive at the emergency spot faster. Besides that, according to an indicator-based qualitative assessment of the proposed navigation system, it is concluded that it contributes significantly to environmental protection and economy enhancing sustainable urban mobility.


Introduction
In the era of "Big Data", spatiotemporal data dominate many domains such as climate science, social science, epidemiology, transportations, mobile health and Earth sciences (Atluri et al., 2017). As far as the domain of transportation is concerned, Intelligent Transportation Systems (ITSs) are increasingly turning to be essential (Cotton, 2013). ITS use sensor technology such as Global Positioning System (GPS) to gather spatiotemporal data (D' Andrea et al., 2016). GPS devices enable the continuous data logging of equipment location in addition to simultaneously recording timestamps (Pradhananga et al., 2013); thus, they provide precious spatiotemporal data that can contribute to various studies. More specifically, vehicles and mobile phones can benefit from GPS data. Amongst them, taxi-vehicles can give as well data with broader coverage of road network and high frequency . Nowadays, many taxis have installed GPS receivers contributing to the production of massive trajectory data (Cai et al., 2017). Data collection from taxis' GPS receivers is significantly much more comfortable in comparison with mobile phone data, floating car data and cargo transport vehicle. What is more, these data can provide accurate information concerning the travel route and travel time . Taxis' GPS data are valuable for the development of solutions that can mitigate the traffic congestion problem and ameliorate mobility in urban areas. Mobility is a crucial issue as it is constrained significantly in these areas (D'Orey et al., 2014). The majority of the population, as well as economic development, exist in urban areas, especially in European countries, thus intensifying the traffic congestions problems. As a result, the traditional transport system comprises a sizeable percentage of total carbon emissions, air pollution, degraded health, resource inefficiencies leading to oil price increase and traffic congestion (Lam et al., 2012). In the USA, traffic congestion is responsible for the waste of 1.9 billion gallons of fuel, of 4.8 billion hours of extra time and 101 billion dollars of delay and fuel cost while in European Union, congestions cost 1% of the total GDP annually (D' Orey et al., 2014). For all the above reasons, the transport sector is of prime importance to achieve sustainable development as it has been defined in the "Brundtland Commission" (United Nations, 1987). Transportation affects significantly all the pillars of sustainable development, namely environment, economy and society (Sdoukopoulos et al., 2019). For this reason, green papers have been devoted to urban mobility, where environmental degradation is identified. According to the European Commission, there is the need to develop ecological transport systems, environmentally and human-friendly (Ambroziak et al., 2014). Trying to mitigate the aforementioned problems, EU has set the concept of Sustainable Urban Mobility Plans (SUMPs), which have as a goal to improve the accessibility of urban areas and to provide highquality and sustainable mobility and transport, through and within the urban areas (European Commision, 2013). Also, the United Nations have defined a "Global indicator framework for the Sustainable Development Goals and targets of the 2030 Agenda for Sustainable Development" (United Nations, 2017). In this paper, we, therefore, aim to use spatiotemporal data for the development of a system that can contribute significantly to Sustainable Development by enhancing Sustainable Urban Mobility through traffic congestion mitigation. The proposed system predicts travel time and further navigates the drivers based on traffic predictions by gathering spatiotemporal data from GPS receivers of taxi vehicles. The predictions of future conditions of roads segments contribute significantly to traffic congestion reduction (Lécué et al., 2013). Namely, the objectives of the proposed system are the followings: (1) Development of a generic methodology beneficial for every urban road network. (2) Development of an automated smart system that will be ameliorated by itself without requiring human intervention. (3) Development of a GIS-based system independent from every external GIS software. The integration of GPS data to GIS for systems' development provides a wide variety of applications in the land transportation offering improved accuracy of spatial data, speed of data transmission and low cost (Mintsis et al., 2004). (4) Development of a system that can produce an autonomous road network map making the system independent of any external map. (5) Development of a tool that facilitates the navigation process, finding the optimum paths based on travel time predictions and fuel emissions. In the following paragraphs, a literature review is presented (section 2) followed by a thorough presentation of the methodology (section 3). Section 4 portrays a comparative analysis amongst the proposed system and other related systems. In section 5, the contribution to sustainable urban mobility of the proposed system is presented, which is followed by the conclusions (section 6).

Map-matching
Vehicle position is a crucial task mainly in urban areas with narrow roads, tall buildings and other signal degrading environments (Lakakis et al., 2014a). Thus, wrongly positioning is likely to happen (Jiménez et al., 2016). The map-matching process assigns the location of the vehicle to a road segment on a digital map. Several map-matching algorithms have been developed, which are categorised as follows: − Geometric: these approaches use the geometry of the study area. The most common methodologies are point-to-point, point-to-curve and curve-tocurve. − Topological: they use the geometry of the area, the connectivity and the continuity. − Probabilistic: they use a confidence area which may be ellipsoid or rectangular. − Based on statistical models: these methods use statistical models such as Hidden Markov or linear regression − Advanced algorithms: Fuzzy logic, Kalman Filter, Neural Networks Their accuracies vary from 85% to 98.5% (Kyriakou, 2017).

Algorithms for traffic prediction
The literature review reveals a rich repertoire of algorithms which have been used for traffic predictions. The algorithms are categorised as following parametric (Exponential smoothing, ARMA, Kalman Filter, etc.) and non-parametric. The most commonly used algorithms are the following: Spectral analysis, autoregressive integrated moving average model (ARIMA), Historical Average, Kalman Filter, K-Nearest Neighbourhood (KNN), Artificial Neural Networks and Random Forest. The process for the selection of optimum algorithm is a crucial and complicated task as each algorithm has negatives and positives. The historical average cannot respond immediately to unexpected events, and this is likely to happen in a road network such as an accident. In the case of the Kalman filter, the accuracy significantly decreases when there are significant changes. For the ANN, the accuracy is heavily depended on the training and more precisely on the quality of the data, the number of hidden layers and the epochs. ARIMA is prone to missing values (Kyriakou, 2017). The use of statistical indicators may suggest the most accurate algorithm for each case. However, it is required at least two different statistic indicators to select the most effective algorithm (Shin, 2017).

Eco-routing
Many studies have shown that selecting different travel paths between the same origin and destination can contribute significantly to diminish the amount of fuel consumed and the produced emissions (Boriboonsomsin et al., 2012). The results for the case study of Thessaloniki showed that if 50 drivers had opted the "eco-route" the produced emissions would be diminished by 10% (Lakakis et al., 2015). Also, according to Ericsson et al., the "eco-route" selection can contribute to 8.2% fuel saving (Ericsson et al., 2006). So finding the most environmentally friendly route is formulated as "eco-routing" problem and many methods have been proposed the last years (

Methodology
The proposed system is designed to predict travel times and navigate the drivers based on the shortest travel time instead of the shortest distance integrating a module for eco-navigation. As a first step, a data mining system is used to select the appropriate GPS data amongst the millions of provided data. Second, a map-matching process is implemented to assign with high accuracy the GPS points to the road segments. This process also leads to the production of a road network map. Then an investigation of the algorithms is carried out to find out the most accurate for the current study and further predict travel times. Calculated travel times from GPS data are used for both the investigation of algorithms and the prediction through the selected one. Furthermore, an eco-route approach is presented based on the calculated travel times. Part of the proposed methodology is used for the development of a GIS-based application that navigates the drivers based on shortest travel time or the "green" path.

Study area
The study area is the urban road network of the city centre in Thessaloniki, which is the second biggest city in Greece. This area consists a typical central business district as many land uses are concentrated such as residential areas, public services, health, shopping areas, industrial, commercial and service areas, education, tourism, churches etc. (GeoSpatial Enabling Technologies, 2016). The land uses affect the attracted direction, the traffic flow, the travel model and the public traffic demand . Approximately 2.4 million daily travels take place in the city of Thessaloniki (Papaioannou, 2012). According to estimations, 94500 vehicles cross the main central arterials daily, and the average peak-hour velocity in the city centre is 6 km/h approximately. Private transportation has been increased in the last years, reaching a percentage of 70%, and the average ratio person per vehicle is only 1.1 (Klotildi, 2014). For the implementation of the proposed methodology, the urban road network of Thessaloniki was divided into road segments. The criterion was the existence of a node with a traffic light. Figure 1 presents the 78 road segments, the 64 nodes and the hierarchy of roads. Table 1 presents the roads under study with their characteristics. Most of the road segments have one direction and one to four lanes.

Spatiotemporal data
Taxi provided the required spatiotemporal data, which use GPS receivers for navigation in the city. While these taxies travel at the road network of Thessaloniki following the general traffic flow, they provide a large amount of GPS traces that include vehicles' unique ID, coordinates, velocity, orientation, time and date. The laboratory of Geodesy and Geomatics of the Faculty of Civil Engineering of Aristotle University of Thessaloniki and Compucon Company collected the required data. The spatiotemporal data concerned September of 2007 (105000 points per day) and September of 2015 (1 million points per day). The drivers did not follow a predefined path; they travelled without any restrictions in the city. Figure 2 depicts the big data that allows creating the road network of Thessaloniki by just plotting the GPS points.  − Conversion of GPS points to geospatial vector data: the classified data are stored as Shapefiles (shp) which are compatible with various GIS software using a MatLab-based script. − Creation of a database management system: the significant amount of data leads to a great number of shapefiles. Therefore, it is essential to structure them in a geodatabase which has a comprehensive information model for representing and managing the data. For this step, ArcGIS software was used.  A map-matching process follows on the assigned GPS points that are produced from the mining system. The map-matching was implemented through linear regression as the accuracy is about 2.5 m for a confidence level of 99.5% . The outcome of this process was GPS points assigned to the actual road segment on which they travel with higher accuracy. On the same time, this process ensured the use of the proper traces for the study area. Through equation (1) of linear regression, a spatial model was also produced for each road segment using coordinates.

̂= ̂+̂ * +
(1) where: ̂ is the predicted value for a specific value of x. Table 2 depicts a sample of the produced spatial models for the years 2007 and 2015 for all the road segments of Niki's Avenue. Additionally, a prediction zone has been calculated for each road segment for a 97.5% confidence level. The smaller prediction zones, the higher accuracies. The prediction zones of the spatial models of 2015 are smaller than 2017 and this is expected as there were more data for this year. Coefficient R 2 has also been calculated to evaluate the accuracy of the spatial models. The high values of R 2 indicate the high accuracy of the spatial models.

Background map production of the road network and qualitative assessment
Implementing the map-matching process for every single road segment contributed to the creation of a road network map. Consequently, the proposed system itself can produce the required background map of the road network. This is very important, as it permits the system to be independent of a background map. Simultaneously, it works as a "closed system", which uses input data to produce a map for navigation purposes. Moreover, this is also vital, as the system can be flexible to every change of the road network; for instance, some constructions may lead to the closure of road segments. The proposed system can realise this change, so it will not navigate the driver through the closed road segment. Finally, yet importantly, the system can be ameliorated by itself without any human intervention. Every time that new data are inserted at the system, the accuracy is improved due to big data, so it is a smart system. A high accuracy integrated system is also developed for the produced map quality assessment using the inertial system ELLIPSE N of the SBG Company. The GPS/INS system collects spatiotemporal data communicating with the following systems: Figure 4a presents the installation of the GPS/INS system on a private vehicle. The GPS was installed on the top of a private car, and the inertial system was installed inside the car exactly above the GPS. Both of them were fixed to avoid any movements that may contribute to noisy measurements. Figure  4b shows a sample of sensor data that were collected through a real-world field-study at the road network in the urban environment of Thessaloniki. These data were used to calculate the confidence interval based on equations of linear regression that are given below : where: s is the estimator of variance and ta 2 is the value of distribution t for n-2 freedom degrees for significance level α. The smaller confidence intervals for the parameters α and β the more qualitative measurements. The confidence intervals were used for the qualitative assessment of the produced map.
In Table 3 confidence intervals are presented for all the road segments of Venizelou Street that were produced based on the equations (2) and (3), and Table  4 shows the associated ranges of confidence intervals.   A plot was created for every single road segment to investigate if the produced road network map is between the calculated upper and lower limits. Figure  5 presents the qualitative assessment for all the road segments of Niki's Avenue. For every single road segment, the evaluation was positive. In some cases, the limits were extremely narrow, and this augmented the accuracy of results.

Travel time calculation and prediction
The method for the travel time calculation has been presented in detail by some of the co-authors at pre-  Table 5 shows the calculated statistical indicators for every algorithm. ANN was selected as the optimum since ANN had the minimum values at three different indicators as Table 5 presents. The architecture of the selected ANN models was one hidden layer and 17 neurons.

Eco-routing
In the current research, the potential of integrating an eco-routing module at the proposed smart navigation system is investigated. The aim is not to calculate the emission with the highest possible accuracy. Instead, the possibility of integrating the module of green navigation at the proposed system is examined. The equations presented hereinafter were used to calculate the amount of CO, NO and NO2 emissions. These are the most important emissions (Elefteriadou, 2013). where ̅ is the average vehicle's velocity based on the following equation: where ̅ is the average vehicle's velocity, the free flow velocity and the traffic density in case of traffic congestion. However, there are some constraints for the use of the above equations. The researcher should define thresholds values concerning the free flow velocity and traffic density velocity (Wilson, 2011).

Software development
Part of the proposed methodology was used for the development of an application that navigates the drivers based on shortest travel time or the "green" path. The proposed method and the custom code attempted to integrate the following disciplines: geoinformatics, spatiotemporal data analysis and optimum prediction algorithm. This integration constitutes an effective process to predict travel time for every urban area. The GIS-based application is userfriendly, open-source and is characterised by open architecture. Every user who needs to make a decision can use this application without being a specialist. The results can be exported to files, which are compatible with GIS software. The software named SmaRT -Ur -Navigation (SmaRT Urban Navigation) and it is a GIS-based software. It runs on desktops and laptops, and it does not require the installation of any external software. Figure 6 depicts the architecture of the developed software and Figure 7 presents the relational databases that software uses.  Figure 6. Software's architecturefrom input data (left) to navigation map (right) Figure 7. Relational databases of software

Case-study
A driver wishes to travel from node A to node B on Friday night at 21.00. According to our proposed system, if the driver opts the travel time as navigation criterion, the driver will travel 1.74 km and will need 4 minutes and 49 seconds while the produced emissions will be 35.72 ppm as it is presented in Figure 8. If the driver opts the "eco-route", the system proposes the path that it is depicted in Figure 9. The total distance will be 1.68 km, the entire duration 5 minutes and 31 second and the produced emissions will be 35.64 ppm. Therefore, in this case, the shortest path is the same as the eco-path but not the same as the shortest travel time path. Boriboonsomsin et al. noticed the same for the case-study of Los Angeles. According to their study, the shortest route was the same as the eco-path but not the same as the  In this scenario, the influence of traffic congestion on travel time. A driver needs to drive from node A to node B that has a distance of 1.38 km on Tuesday morning at 11.00 and the same day at 21.00 at night is investigated. For the morning path, the optimum path based on shortest travel time criterion has duration 5 minutes and 54 seconds and 29.6 ppm produced emissions. On the contrary, if the driver follows the same path at night will need only 2 minutes and 47 seconds. This is expected, as the city centre of Thessaloniki is a Central Business Center and concentrates many land-uses, attracting too many transportations and provoking traffic congestion. The peaking patterns of traffic congestion in the morning and afternoon hours is the same as the patter of every big urban centre (National Association of City Transportation Officials, 2012).

Comparative analysis of related and proposed system
The most recent navigation system for the city of Thessaloniki is Google maps from Google. This system provides the possibility to navigate based on travel times in real-time. Besides, it predicts travel times. However, travel time predictions are not accurate for short distances. The system was tested for travel time predictions for an accidentally Monday at 18.00. It was asked to predict the travel time for short routes such as 30m, 100m, 150m and 290m. The prediction was the same for each case, 1 minute. Consequently, the system cannot predict with high accuracy for the case study of Thessaloniki city. Finally, yet importantly, the system does not provide the possibility of eco navigation. The next system that was studied is the "WIS-ERIDE" that was developed by the company Emisia in 2014 incorporation with the Faculty of Civil En-gineering of the Aristotle University of Thessaloniki. This system provides information about the current traffic conditions, the traffic volume and the associated emissions. The users can use the system both on a computer and on smartphone and can choose to be navigated based on travel time, shortest path and economic path. There is not an option for eco navigation. Besides, the users cannot select the day but only the hour of the travel. This system predicts travel time based on the relation between velocity and flow. Also, it predicts the emissions implementing the COPERT 4 methodology (Gavanas et al., 2014). The last system was Thessaloniki's Intelligent Urban Mobility Management System (mobithess) which was developed by the Hellenic Institute of Transport -Centre for Research and Technology Hellas incorporation with Municipality of Thessaloniki, Thessaloniki's Integrated Transport Authority, Region of Central Macedonia and Norwegian Centre for Transport Research. It is a web-based platform with a friendly graphical user interface. This platform provides with various criteria for navigation, environmentally friendly, economic, shortest path and travel time. However, only time can be defined so the user can use it the same day of the travel. The emissions are calculated using real-time sensors while for the traffic prediction, dynamic and static models are used (Mitsakis et al., 2013). From the comparative analysis, it is concluded that all the relative systems can work in real-time except from the proposed system in the framework of the current research that can work in almost real-time if it is connected to a real-time data provider. Smart-Ur-Navigation, WISERIDE and Mobithess use taxi GPS data while Google Maps used the GPS positions of the users who use a smartphone with the homonym application. Smart-Ur-Navigation and Mobithess give the possibility to opt the "eco" path while the last one and WISERIDE provide the economic path option. Concerning the limitations, for the proposed system, it is required a considerable amount of data to work. Google Maps has not such a high accuracy for short distances. All the other systems depend on external road network maps while Smart-Ur-Navigation systems produce itself the required maps permitting to be independent of any external map. Lastly, WISERIDE and Mobithess can make predictions only for the current day. Table 6 summarises the comparative analysis of the studied navigation systems.

Contribution of the proposed system to sustainable urban mobility
At the current study, some indicators were selected to assess the contribution of the smart navigation system to sustainable urban mobility. Sdoukopoulos et al. presented a methodological approach to finding out the most representative indicators related to every single pillar of sustainable development, Economy (EC), Environment (EN), Society (SC). Despite these pillars, indicators for safety were also considered as it is a crucial domain for urban mobility. They carried out a descriptive statistical analysis of various sustainable transport indicators and then they carried out a comprehensive analysis (Sdoukopoulos et al., 2019). Therefore, their outcome was selected as it is quite recent and covers a great variety of themes. Some indicators may have an impact on more than one pillar, for example, traffic congestions increase the air pollution (EN), cause waste of time for the passengers (SC) and have associated costs (EV) (WBCSD, 2015). In Table 7, the indicators are depicted and the assessment.   Figure 10 shows the overall contribution of the proposed system to sustainable urban mobility based on the selected indicators. The proposed system contributes significantly to sustainable urban mobility. More precisely, the proposed navigation system contributes mainly to the indicators of environment, economy, environment and social, and economy and social with a percentage of more than 50%. The contribution to social indicators is 38%, and the percentage of contribution to the indicators that combine all the pillars, environment, economy and social, is 24%. In Figure 11, the type of contribution, direct or indirect, is presented. It is evident that the proposed system contributes directly to the indicators of environment, economy and social, and social and safety. On the contrary, it contributes indirectly to the other sectors while it has no contribution to the sector of environment and economy. Fig. 10. The overall contribution of the proposed navigation system to the indicators of sustainable urban mobility Last, Figure 12 presents the degree of contribution to each sector based on a qualitative scale, high, medium and low. The indicators of the aforementioned sectors that are affected directly by the proposed system have a contribution degree from medium to high. The indicators of the other sectors are affected to a low degree except from the sector of environment and economy that is not influenced at all.

Conclusions
Sustainable urban mobility is a crucial issue that should comprise a vision for every urban area. Urban traffic congestion cause large-scale impacts including air pollution, increased commute times and health issues intensifying the unsustainable mobility since human and ecological values are at high risk.
In the framework of sustainable urban mobility, the current paper endeavours to contribute to the abovementioned considerations by proposing a solution to reduce urban traffic congestion. The main objectives were (1) to develop a generic methodology beneficial for every urban road network, (2) to develop a smart GIS-based system independent from external GIS software (3) to establish a system that produces an autonomous road network map (4) to develop a decision support tool for urban navigation based on predicted travel time and fuel emissions.  Spatiotemporal data from GPS receivers of taxies were collected and analysed to develop a navigation system based on travel time. The GPS taxi data contained vital information for the development of our system. The significant amount of data required the development of a data mining system. Thus, it was feasible for the proposed system to predict travel times and navigate the drivers based on the shortest travel time. This contributes remarkably to decrease traffic congestion, and travel time aw it helps drivers not to waste their time in idling vehicles. The implementation of a map-matching process contributed to a more accurate data selection. Also, this process made the proposed system outperform as it generates an autonomous road network map by itself. Hence, the system produces an updated road network map identifying any changes at the road network, giving the possibility to detect an incident or some constructions that may have led to close road segments. Consequently, the system contributes to the drivers' safety, helping them to avoid incidents. The optimum algorithm for the travel times prediction was ANN for the current case study, taking into consideration the day, the time, the time period, the homogenous time zone and the velocity. The GISbased software that was developed is user-friendly and does not require external GIS software; instead, it provides data for GIS software in a compatible format.
Finally, yet importantly, in case of connecting a real-time data provider with the proposed system, it can lead to an almost real-time system. The amount of data per road segment is a significant limitation of the proposed methodology. It was not feasible to calculate several travel times and then to predict for road segments with inadequate data. Furthermore, the implemented method for emissions predictions was not optimum, leading to inaccurate predictions. However, this was not the aim of this paper. Instead, the possibility to predict emissions based on travel times was investigated. For forecasts with higher accuracy, a more advanced method is required, and this constitutes another limitation for this research.
The innovative points of the current research could be summarised as follows: (1) it is the only system amongst the studied system that implements a mapmatching process for more accurate data selection.
(2) The system uses machine learning through the ANN and can be ameliorated continuously.
(3) The system produces the necessary road network map using the GPS taxi data permitting thus an immediate map update. (4) The software is open-source, freely distributed and has open architecture allowing changes and amendments by anyone. The studied systems were open-source but not freely distributed.
To conclude, the proposed system is a valuable support decision tool mainly for emergency vehicles such as ambulances and fire vehicles. Additionally, it is a vital tool to identify traffic congestion spots detecting simultaneously possible points for car accidents. Thus, stakeholders can use this tool to ameliorate traffic signs and enhance road security.