A Social Media Based Approach for Route Planning During Urban Events

Traffic congestion is a major issue in most big cities, resulting in longer travel time and increased greenhouse gas emission. Various factors can cause traffic congestion, and includes not only traffic events on roads (e.g., car accidents) but also urban events (e.g., football games, concerts, and festivals), where a large number of human activities happen in a certain place and at a certain time. The technology of connected vehicles (CV) has provided a crowd-souring platform enabling communication between vehicles and surrounding information share to be more timely and effective. Taking the advantage of that, in this paper we focus on navigation during urban events, and present an approach to find feasible routes avoiding traffic congestion caused by the different types of events. Using 12-month geo-tagged tweets, we create a human activity network to capture certain types of human activities across cities. Based on that, an event estimation algorithm is developed to find the possible events that would occur in the near future, and to estimate their probabilities. These detected events are represented in the form of obstacle polygons with timestamps, and are used by the routing algorithm to generate congestion avoidance routes. We apply our approach to the road network of Toronto, Ontario, Canada, and the experimental results show the capability of our approach in supporting routing during urban events.


I. INTRODUCTION
Traffic congestion and jams have been a common issue in most big cities in the world, which can be influenced by various factors such as traffic incidents, weather and special events [1]. Among them, urban events, such as football games, musical festivals, and strikes, which occur in certain places (i.e., hotspot areas) and at certain times, are one of the most important factors [2]. Similar urban events may successively arise at different locations within a certain period, which can subsequently cause traffic jams in the surrounding areas of those locations. For instance, an import sport game can increase the risk of traffic congestion at different urban areas where fans tend to get together to watch the game. Therefore, forecasting the occurring locations of such potential activities/urban events and detecting their surrounding traffic conditions will effectively contribute to route planning, leading to a better travel experience.
The associate editor coordinating the review of this manuscript and approving it for publication was Tu Ngoc Nguyen . Recently, the significant advance of ubiquitous computing has enabled us to capture human activities with broad coverage [3]. Meanwhile, the emerging technology of connected vehicles (CV) provides a practical means of effective real-time communication between vehicles. They collaboratively offer a promising way to fulfill the aforementioned goal for a better dynamic routing to avoid traffic jams and congestion.
Modeling and prediction of traffic conditions is an important topic in the field of transportation. While considerable research focuses on the prediction of travel time or traffic volume, special attention has been devoted to the prediction of traffic congestion and jams. Inspired by social insect system, [4] developed a model based on pheromone paradigm to predict traffic congestion. Each car is considered as a social insect who can release pheromone based on traffic information. Using similar concepts, [5] built a congestion forecast system based on a multi-agent system. Each agent is set up at every road intersection and uses pheromone mechanism for communication and coordination. Reference [6] proposed a Bayesian spatial joint model for crash prediction, considering VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ the spatial correlations between heterogeneous types of entities (e.g., segments and intersection). Reference [7] presented a system that uses semantic Web technologies to predict the severity of road traffic congestion. The system integrates various traffic related information (e.g., weather information, road works) to improve accuracy and consistency of traffic congestion prediction. Using floating car trajectory data, [8] proposed an approach, which combines the traffic flow prediction (with particle swarm optimization) and the congestion state fuzzy division to estimate and predict the urban traffic congestion. The models mentioned above mainly rely on the data collected by traffic sensors. Nevertheless, in many parts of road networks, real time or historical traffic information is often unavailable due to a lack of traffic sensors, which limits the application of these prediction models. Geo-social media platforms, such as Twitter and Facebook, which can provide a wide range of geo-related datasets, have been used in many studies for transportation system analysis [9]. One of the popular applications is to detect and model traffic events by analysing social media reports. Reference [10] developed a real-time monitoring system for traffic event detection based on Twitter stream analysis, using the support vector machine to perform classification of traffic and non-traffic tweets. Similarly, [11] developed a system based on Twitter data and machine learning to detect traffic congestion. Using Twitter public API, [12] developed an incident monitoring system called TrafficWatch. The system employed the NLP (natural language processing) techniques to process tweets, and applied different machine learning algorithms to classify the detected events or incidents. Another application is to leverage social media data for predication of traffic events. Aiming at improving longer-term traffic prediction, [13] proposed an optimization framework, which extracted traffic-related indicators based on tweet semantics and applies them for traffic intensity prediction via linear regression. Reference [14] used tweets posted by particular organizations or governments to mine congestion correction and to forecast citywide traffic congestion. Reference [15] applied granular computing to transform traffic event information collected from social media data to probabilistic information granules for travel time estimation. The studies presented above provide a rich collection of methods for traffic modeling and prediction, but they only focus on traffic conditions occurring on roads. Little attention has been paid to the urban events that happen close to but not on roads and can also influence on traffic conditions. Vehicle routing problems with traffic congestion have been considerably studied in both academia and industry, and researchers have proposed various methods and strategies for avoiding traffic congestion or improving the traffic management [16], [17]. Based on a bio-inspired algorithm, [18] used the ant algorithm and developed a dynamic routing system, which introduces a hierarchy among roads and maintain a table (populated with traffic data) for routing at each intersection. Similarly, the ant algorithm was also applied by [19] to build an ant-based vehicle congestion avoidance system (AVCAS) to solve traffic congestion in vehicular networks. The developed system uses the real-time traffic data to predict the average travel speed of roads and tries to find the least congested shortest paths in order to avoid congestion. By combining the Dijkstra algorithm and a heuristic algorithm (e.g., Particle Swarm Optimization), [20] proposed a hybrid vehicular re-routing strategy for vehicle traffic congestion avoidance, taking dynamic time constraints into consideration. To mitigate the impact of unexpected traffic congestion, [21] developed a vehicle rerouting system called Next Road Rerouting (NRR), which assists drivers in rerouting based on local traffic conditions and can disseminate its impacts on connected areas by using a multi-agent system. However, in the congestion avoidance systems mentioned above, there is a lack of consideration of urban events, which could occur not on roads but still cause traffic congestion and jams.
In this study, we focus on navigation during urban events, and provide an approach to generate routes that can avoid traffic congestion caused by urban events. To support path planning, an event prediction model, which is based on modeling of human activity patterns, is used. By analyzing massive geo-tagged tweets, we group them into spatio-temporal clusters to capture human activities, and create a network of activity clusters, which are linked based on their semantic similarity and temporal similarity. Based on relationships between human activity clusters, a special algorithm is developed to find the possible events and to calculate their occurring probabilities. For route generation, we apply an A* based routing algorithm, which can take into account the temporal aspect of these predicted events and their influence on road networks. The remainder of the paper is structured as follows: Section II presents the system architecture and its components proposed for supporting path planning. In Section III, we give details on the event prediction module, which is built on top of a human activity network. In Section IV, we provide a method to use the predicted information of urban events to generate congestion avoidance routes. Section V shows the results of applying our approach to the case of the city of Toronto, Canada. In Section VI, we give our conclusions, discuss the limitations of our approach, and provide suggestions for future research.

II. SYSTEM ARCHITECTURE
In this section, a routing system integrated with an event prediction model is presented. As shown in Figure 1, the system architecture is composed of four main components: 1) Event prediction module; 2) Network update module; 3) Geo-database; and 4) Route planning module. For event prediction module, we use a model developed by [22], called Human Activity Network, which is able to model human activity patterns from space, time and semantics. Each node in the human activity network represents a human activity (urban event) in a place at a time. The semantics of a node can be used to infer what an activity is about. A group of connected nodes is the similar activities (urban events) in Z. Wang, W. Huang: Social Media Based Approach for Route Planning During Urban Events  terms of time and semantics, where any state change of nodes can affect each other. In our case, the state change refers to the occurrence of events. The human activity network is developed based on geo-tagged tweets, which are stored in a geo-database and used by the event prediction module to predict the events that are linked with the known events. We extract the data of road networks from OpenStreetMap (OSM), and also store it into the geo-database. The network update module estimates the influence of these events based on the prediction of the occurrence of the events and derives the availability of roads. In this study, we create a buffer around the location of each event to represent its affected area, and perform intersection operations between roads and buffers to find the roads that are blocked by these events. A routing algorithm is employed by the route planning module to plan routes avoiding the traffic jams and congestion caused by the events. Figure 2 demonstrates the route planning process.

III. URBAN EVENT PREDICTION BASED ON HUMAN ACTIVITY PATTERNS
In this section, we present our approach that uses geo-tagged tweets to predict urban events that may cause traffic congestion and jams. By analyzing historic geo-tagged tweets, we apply the human activity network model to capture human activity patterns (see Section III-A). Based on the derived human activity model, we develop an algorithm to find the events that are associated with the known events and to estimate their occurrence probabilities (see Section III-B).

A. HUMAN ACTIVITY MODELING
We adopt the approach proposed in [22] (i.e., a human activity network) to model human activities, extracting events involving in our route planing algorithm. The idea of the human activity modeling is that similar human activities can occur at different locations across a city, which aligns with our study that is trying to figure out where certain activities (events) will subsequently occur when their similar activities (events) are observed at different locations. The similar activities are defined as a type of activities that happen during the same time period and have a similar semantic pattern. In [22]'s work, a tweet cluster represents an activity, and the topics inferred from a tweet cluster represent the semantic pattern. Figure 3 illustrates how urban events (human activities) are detected, forming an urban event network. First, spatio-temopral clustering is conducted for geo-tagged tweets, where each spatio-temporal cluster refers to an urban event in a place at a time. Then, the text of the tweets grouped in each cluster is used to infer the semantic pattern. Subsequently, the semantic similarity and the temporal similarity between the clusters are computed. Finally, similar clusters are further grouped based on the similarities (see the linked clusters in Figure 3c) to form the urban event network. As the degree of similarity of each linked clusters varies, ranging from 0 to 0.5 (the closer to 0, the more similar), we accordingly normalize it between 1 and 0 as the probability of the occurrence of an event when the linked activity is observed using the following equation: 207592 VOLUME 8, 2020 where p(b|a) refers to the probability of the occurrence of event b under the observation of event a; Sim ab refers to the similarity between event a and event b. By applying Equation 1, the urban event network is transformed to a weighted graph. We can then effectively find where an event will be most likely to subsequently occur by locating the linked cluster to the one observed based on the probability.

B. URBAN EVENT ESTIMATION
Given a set of known events, we develop a Breadth-First search based algorithm to find the events associated with the known events and to estimate their occurrence probabilities, based on the event relationship graph built from Twitter data (presented in Section III-A). Figure 4 shows the structure of the developed search algorithm. The algorithm maintains two sets: openSet which is used to explore the event space, and foundSet which stores the events that have been found. The occurrence probability of event v is represented by prob(v). Initially, all known events are inserted into openSet (line 2, Figure 4). The algorithm starts from the first element in openSet (line 4, Figure 4) and then expands it to its associated events. From line 5 to line 13 in the algorithm (Figure 4), a loop is used to search all the connected events and to update their occurrence probability, using the probability derived from equation 1 (in Section III-A). For each related event v , the algorithm first estimates its occurrence probability, and then updates the probability of v if it is already included in openSet. If not, the event v will be added to openSet for further extensions. The algorithm performs the loop until no event is in openSet, and returns foundSet as the output of the algorithm.

IV. ROUTE PLANNING
Given the predicted information of urban events (obtained from Section III), in this section we present our approach to generate feasible and fastest routes to avoid possible congestion caused by the urban events. Let G = (N , E) be a graph of the road network, which consists of a finite set of nodes N and edges E. Each edge represents a road segment, and each node corresponds to a road junction. When events occur, parts of road network will be affected by the events with certain probabilities. In this study, we assume the vehicle drivers have different preferences of using roads with different probabilities of being blocked, and try to minimize the total travel time, based on the predicted states of roads.
To estimate the influence of events, following the previous research [23], we use the distance threshold of 500 meters to generate buffers around the event points to represent the areas affected by the events. Then a spatial intersection operation between the event affected areas and the road network is performed to determine all the affected roads and their associated events. Because a road can be affected by multiple events in surrounding areas, we represent it by a vector of (probability, time) pairs, and associate this vector with each edge e ∈ E, i.e., PT = (p(v1), t v1 ), (p(v2), t v2 ), . . . , (p(vn), t vn ) , where each pair in PT is associated with a certain event v. Using the predicted information of events, in this paper we use the function presented in Figure 5 to derive the accessibility for each road. Considering that different drivers have different preferences on the congestion aversion, in this function we introduce a factor (i.e., P a ) to capture this effect. Here we assume that a road is not accessible if the probability of event p is larger than P a and remain closed when the events take VOLUME 8, 2020  place at time t. The earliest time of the events that have higher probability than P a is selected and will be used for routing.
With the derived information of the availability of roads, we applied the modified A* algorithm developed in [24] to generate fastest routes to avoid the areas affected by events. The algorithm considers both static information (i.e., the topological constraints and spatial properties of the network) and dynamic information (i.e., the temporal information of the availability of roads) to derive the possible arrival time of each node, given the starting time and the speed of vehicles. Only the nodes that can be safely reached will be explored to generate final feasible routes. As mentioned earlier, it is assumed that roads are not available any more once they are identified to be closed, therefore, the vehicle has to move as fast as possible to avoid the roads blocked by the events, and waiting options are not considered in the algorithm. More details of the used algorithm can be found in [24].

V. IMPLEMENTATION AND APPLICATION RESULTS
Following the system architecture presented in Section II, a prototype routing system was implemented. Within the system, the open source GIS Toolkit GeoTools (www.geotools.org) was employed to perform spatial data FIGURE 7. Snapshots of the calculated route (in black) for vehicle V1. The areas that could be affected by urban events are colored in yellow, and will be colored in red, if they are considered for routing and their occurrence time is reached. The vehicle is colored in blue. The destination is colored in green. processing and to support extraction of the essential data for routing. All the needed data about events and the road network were stored and maintained in the relational database PostgreSQL with the extension PostGIS (www.postgis.org), based on the data model presented in [24]. We applied the proposed approach to the road network dataset of Toronto, FIGURE 9. Snapshots of the calculated route (in black) for vehicle V3. The areas that could be affected by urban events are colored in yellow, and will be colored in red, if they are considered for routing and their occurrence time is reached. The vehicle is colored in blue. The destination is colored in green. April 2015) were collected via Twitter public streaming API. After data filtering, 3,684,980 tweets from 18,122 users were selected and used to build the model for prediction of the urban events. To assess the routing capability of our navigation system, we used an agent-based toolkit, GeoMA-SON [25] to simulate the movement of vehicles and the development of events. The calculated routes, together with the data about the events, are delivered to GeoMason for simulation, and are displayed on OpenStreetMap for users. The results from the application are shown in the following sections.

A. URBAN EVENT NETWORK
After applying the urban event detection approach (presented in Section III-A) to the collected geo-tagged tweets, a urban event network in Toronto is established. As shown in Figure 6, each node indicates an inferred urban event, and the linked nodes are considered as similar urban events that can occur successively during a certain period.
In total, 448 urban events were detected across the city, where downtown Toronto (the red box in Figure 6) involves more events than other regions. It is noted that some similar events are relatively close each other (see the nodes connected by the edges in blue), while there also exist similar events that are far away from each other (see the nodes connected by the edges in orange and brown). Those remotely similar events are mainly considered as obstacles, which are taken into account in our algorithm to find an alternative solution to avoid the potential traffic congestion areas.

B. ROUTING RESULTS
In this section, we first apply the event prediction algorithm (presented in Section III) to find relevant events and to estimate their probability, given a certain set of known events, and then use the proposed system to integrate this event information to generate routes. Here we assume that four vehicles are moving when these events occur. They have to go from the same origin to the same destination and depart at the same time (T = 10 min), but the drivers of the vehicles have different preferences (indicated by P a ). Figures 7,8,9, and 10 display the snapshots of the routes calculated for these vehicles respectively, and the details of route results are shown in Table 1. As we can see from these figures, as the maximum acceptable congestion probability decreases, more detected events have been considered as obstacles (in red), which have to be avoided during routing process.
By integrating the profiles of the drivers, our system generates different routes for the involved vehicles, taking into account the obstacles caused by the predicted urban events. As shown in Table 1, Vehicle V1, which accepts a higher congestion probability than V2 and V3, obtains a route that is shorter than these two vehicles. But from these figures, we can also see that it makes the vehicle (V1) to pass though more areas (in yellow) that have possibility of being blocked by the events. The results also show that given the different vehicle speeds for vehicle V2 and V4 and the temporal aspect of the events, the system calculated different routes for these two vehicles. As vehicle V4 is moving faster than V3, it can pass through the roads before they are affected by the events, which results in shorter travel distance and time. The above results indicate that our approach can not only provide routes that avoid obstacle caused by the urban events, but also allow for route customization based on the profile of drivers.

VI. CONCLUSION AND FUTURE WORK
Traffic congestion and jams are an important issue in urban environments, and are caused by not only traffic events (e.g., car accidents) but also urban events such as football games, musical festivals, and strikes. If we can know when and where certain urban events will hit the route planed for a travel before route planning, the potential traffic congestion caused by those events can be avoided to a certain degree when traveling. This would significantly reduce travel time and cost. The recent advance of ubiquitous computing and the technology of CV has provided an opportunity for us to fill this gap. This paper focuses on navigation during urban events. We created a human activity network model to capture the connections between urban events. A routing system, which is combined with the human activity network model, was developed to generate routes that can avoid traffic congestion and jams caused by those events. The human activity network is built based on massive geo-tagged tweets, involving space, time and semantics. Using this activity network model, we estimated the probability, the location and time of the occurrence of the events, and applied them to predict the status of roads. A modified A* algorithm was adopted to take into account the predicted information on the road accessibility and to generate routes avoiding traffic congestion during urban events. We applied our approach to city of Toronto, Canada, and the experimental results showed the potentials of our approach for supporting navigation avoiding traffic congestion and jams.
Although our approach shows some promising capability of navigating vehicles in the presence of urban events, there are some important issues that should be pointed out and require further investigations. First, the used model for predicting urban events is limited by the data available from Twitter and may not capture all human activities in the cities. Other social media sources (e.g., Facebook) would be needed to improve the prediction model to cover more topics and more geographical areas. Second, currently we assume that all urban events have the same impact on the surrounding areas (the same buffer), however different types of urban events may have different impacts, which change in space and time. In the future, we will seek an approach to quantify the impact of urban events, based on various event information, such as the type of events, and the number of tweets. Third, we used the model of urban events to predict the traffic conditions on roads. Due to lack of real traffic data, the prediction performance of the model is not evaluated yet. One of the next steps is to test the proposed approach by comparing the results from the prediction model of urban events with the data collected from real traffic measurements. Forth, in the current study we assume vehicles travel at a constant speed. But the vehicle speed is largely dependent on traffic conditions and could change over time. One of the possible ways to address this is to include a speed adjustment factor into the routing algorithm [26]. Last but not least, in this paper the occurrence probability of the events is used for drivers to select the roads they prefer. But some drivers may have different criteria on route determination, e.g., choosing a route with the minimum probability of being blocked by the events. Therefore, another research direction would be to develop a more sophisticated system by integrating different user requirements.