Mining and correlating traffic events from human sensor observations with official transport data using self-organizing-maps

https://doi.org/10.1016/j.trc.2016.10.010Get rights and content

Highlights

  • SOM result revealed latent temporal relationships and varying daily traffic disruption patterns.

  • Strong correlation of traffic-related, georeferenced tweets with special events (r = 0.73), and traffic incidents (r = 0.59) from official TIMS traffic data.

  • No correlation of traffic-related, georeferenced tweets with traffic volume (r = −0.19) and works (r = −0.10) disruptions.

Abstract

Cities are complex systems, where related Human activities are increasingly difficult to explore within. In order to understand urban processes and to gain deeper knowledge about cities, the potential of location-based social networks like Twitter could be used a promising example to explore latent relationships of underlying mobility patterns. In this paper, we therefore present an approach using a geographic self-organizing map (Geo-SOM) to uncover and compare previously unseen patterns from social media and authoritative data. The results, which we validated with Live Traffic Disruption (TIMS) feeds from Transport for London, show that the observed geospatial and temporal patterns between special events (r = 0.73), traffic incidents (r = 0.59) and hazard disruptions (r = 0.41) from TIMS, are strongly correlated with traffic-related, georeferenced tweets. Hence, we conclude that tweets can be used as a proxy indicator to detect collective mobility events and may help to provide stakeholders and decision makers with complementary information on complex mobility processes.

Introduction

The complexity of cities with related human activities is becoming an increasingly tough challenge for policy makers and modelers to explore urban dynamics and the study of city-scale mobility patterns. One promising example of available high-granularity information sources is the Transport for London’s (TfL) Traffic Information Management System (TIMS). It provides high resolution, up-to-date disruption information regarding congestions, traffic incidents, events, construction works and other issues affecting traffic. However, these existing traffic measuring systems (e.g., road-side detectors, video surveillance, floating car data, etc.) are resource intensive in terms of ongoing operating and maintenance costs. Furthermore, a complete detection of all traffic and road conditions is simply not feasible.

At the same time, in recent years an increasing amount of information has been generated through mobile devices, becoming a potentially powerful data resource for (geographic) knowledge discovery and human behavior analysis from crowdsourced data (Goodchild, 2007). For a number of disciplines this development opens up enormous potential for various applications, including urban- and traffic planning, disease- and disaster management.

In particular, harnessing human mobility information from social media platforms such as Twitter can potentially lead to new insights into the human mobility process. Due to the high spatiotemporal resolution this may provide complementary information when compared with existing traffic data sources. However, one main challenge when analyzing mobility with officially acquired data is the spatiotemporal complexity of latent processes within traffic events (e.g., effects of incidents such as roadworks on the traffic flow and the correlations with traffic disruptions), hampering the detection of patterns in large road networks (Asif et al., 2014).

Simultaneously, when using crowdsourced information it is uncertain how representative and trustworthy these new types of geodata are for the inference of human mobility patterns (Steiger et al., 2015c). Thus, research in this area requires new methodological approaches, which consider the high dimensionality and uncertainty of crowdsourced geographic information in the context of a data-driven geography (Miller and Goodchild, 2014). In a previous study, we therefore applied and have demonstrated the efficiency of self-organizing maps (SOMs) to abstract and cluster information from multidimensional Twitter data in a trans-disciplinary approach (Steiger et al., 2015b). However, it has not been analyzed whether spatiotemporal information from social media data is a suitable proxy for inferring certain traffic-related events. Further the question is whether the results in comparison with official traffic disruption reports lead to new insights regarding the study of human mobility patterns.

In this paper, we use a non-geographic and a geographic self-organizing map (SOM/Geo-SOM) to discover collective human mobility clusters by analyzing similar variances within geospatial, temporal and disruption characteristics from live traffic feeds. The results are correlated with traffic-related georeferenced tweets for a case study in London. We intend to answer the following research questions (RQ):

(RQ1): What is the correlation between inferred spatiotemporal clusters from tweets, as a proxy of collective human mobility patterns and the real time traffic information provided by TIMS?

(RQ2): Which official traffic events along with their individual traffic disruption characteristics (category, severity, duration) are reflected in traffic-related tweets and have a dissimilar/similar spatiotemporal distribution?

Section snippets

Background

This section summarizes the characteristics of both datasets used in this analysis (Sections 2.1 Comparative reference dataset: TIMS disruption messages, 2.2 Social media dataset: Twitter). Then, related work in the area of spatiotemporal mobility analysis is depicted in Section 2.3, followed by a description of the current state of the art regarding the application of SOMs for mobility analysis in Section 2.4.

Methodology

Our methodological approach intends to leverage existing knowledge about the spatiotemporal characteristics of traffic disruptions from official data in order to compare, which patterns of inferred disruption clusters are similarly reflected within georeferenced tweets. Therefore, we compute the similarity between the three available information layers for traffic disruptions: the categorical attributes for each disruption (category of event, severity and duration), the geographic location and

Case study and results

The analysis framework described in Section 3 has been applied in our case study for official traffic disruptions and the Twitter dataset. This section summarizes the results of our analysis.

For our case study of Greater London, we analyzed and compared 129,651 real-time traffic disruptions from Transport for London with 63,407 georeferenced tweets for one month. For the Twitter data acquisition process, only georeferenced tweets within a given bounding box (see Table 2 for further description)

Discussion of results and applied methods

The SOM result within Section 4.1 revealed varying durations of observed disruption patterns depending on the time of their occurrence. Detected incidents and corresponding types of disruptions differ in severity and duration through the day time during weekdays and weekends, reflecting the bimodal (peak hour) distribution of human mobility (Wang et al., 2014). The SOM assists to explore these temporal variations of input attributes by topologically grouping dissimilar and similar disruptions

Conclusion and future work

This paper presents the results of a combined SOM/Geo-SOM analysis framework for the detection of distinctive mobility disruption patterns from official TIMS messages and the comparison between traffic-relevant, georeferenced Twitter messages.

We have chosen a SOM (Section 4.1) and a Geo-SOM (Section 4.2), in order to assess non-geospatial components and the combination with geospatial components separately.

First, we uncovered latent temporal relationships of traffic disruption properties and

Acknowledgements

This research has been funded through the graduate scholarship program “Crowdanalyser- spatiotemporal analysis of user-generated content”, supported by the state of Baden Wurttemberg. This research has been supported by the Klaus Tschira Stiftung gGmbH. We thank the anonymous reviewers for their constructive and helpful suggestions. Furthermore, we also thank Transport for London for providing free available real time transportation data licensed under the Open Government License v.2.0.

References (53)

  • D. Blei et al.

    Latent dirichlet allocation

    J. Mach. Learn. Res.

    (2003)
  • E. Cho et al.

    Friendship and mobility

  • Couronne, T., Beuscart, J., Chamayou, C., 2013. Self-Organizing Map and Social Networks: Unfolding Online Social...
  • J. Cranshaw et al.

    The livehoods project: utilizing social media to understand the dynamics of a city

  • C.-C. Feng et al.

    Combining Geo-SOM and hierarchical clustering to explore geospatial data

    Trans. GIS

    (2014)
  • L. Ferrari et al.

    Extracting urban patterns from location-based social networks

  • A. Fotheringham et al.

    The modifiable areal unit problem in multivariate statistical analysis

    Environ. Plan. A

    (1991)
  • S. Gao

    Spatio-temporal analytics for exploring human mobility patterns and urban dynamics in the mobile age

    Spat. Cogn. Comput.

    (2014)
  • M. Gonzalez et al.

    Understanding individual human mobility patterns

    Nature

    (2008)
  • M. Goodchild

    Citizens as sensors: the world of volunteered geography

    GeoJournal

    (2007)
  • J. Gorricha et al.

    A framework for exploratory analysis of extreme weather events using geostatistical procedures and 3D self-organizing maps

    Int. J. Adv. Intell. Syst.

    (2013)
  • Hagenauer, J., Helbich, M., Leitner, M., 2010. Visualization of crime trajectories with self-organizing maps: a case...
  • B. Hawelka et al.

    Geo-located Twitter as proxy for global mobility patterns

    Cartogr. Geogr. Inf. Sci.

    (2014)
  • M. Helbich et al.

    Exploration of unstructured narrative crime reports: an unsupervised neural network and point pattern analysis approach

    Cartogr. Geogr. Inf. Sci.

    (2013)
  • B. Jiang et al.

    Selection of streets from a network using self-organizing maps

    Trans. GIS

    (2004)
  • B. Jiang et al.

    Characterizing the human mobility pattern in a large street network

    Phys. Rev. E

    (2009)
  • Cited by (32)

    • Understanding spatiotemporal trip purposes of urban micro-mobility from the lens of dockless e-scooter sharing

      2022, Computers, Environment and Urban Systems
      Citation Excerpt :

      Tradition approaches are mainly based on conducting user survey and collecting questionnaire, for instance (Weis et al., 2021) conducted a user survey in Switzerland about their choice of urban mobility modes and trip purposes, and identified important socio-economic factors for future regional transport demand modeling and policymaking. Moreover, researchers from various fields have dedicated efforts on this topic from diverse data-driven perspectives (Alexander, Jiang, Murga, & González, 2015; Birenboim & Shoval, 2016; González, Hidalgo, & Barabási, 2008; Hasan & Ukkusuri, 2014; Huang & Li, 2016, 2019; Song, Qu, Blumm, & Barabási, 2010; Steiger, Resch, de Albuquerque, & Zipf, 2016; Yan et al., 2018; Ying, Lee, & Tseng, 2014). In addition to the studies focused on uncovering spatiotemporal regularities of human activities, researchers have also been working on mining their underlying trip purpose patterns (Alexander et al., 2015; Birenboim & Shoval, 2016; Huang & Li, 2016, 2019; Ying et al., 2014) from various data sources, such as user trajectory data, cell phone data, and geo-tagged social media data.

    • Estimating local-scale domestic electricity energy consumption using demographic, nighttime light imagery and Twitter data

      2021, Energy
      Citation Excerpt :

      For instance, although there are a few studies estimated the energy consumption at 1 km resolution [14,17,18], their reference data used to validate the estimates are interpolated or simulated rather than observed directly. Apart from remote sensing data, social sensing data have been used as the proxy for human activity in very recent years [21,22]. Like demographic and remote sensing data, social sensing data might have the potential to indicate different level of the energy consumption over space as human activity volume is positively correlated with the energy consumption.

    • From Twitter to traffic predictor: Next-day morning traffic prediction using social media data

      2021, Transportation Research Part C: Emerging Technologies
      Citation Excerpt :

      On the contrary, little has been done using the information from social media to directly inform real-time traffic prediction. Steiger et al. (2016) explored the spatiotemporal relationship between mobility patterns with traffic-related tweets using self-organizing maps. The strong correlations between spatiotemporal tweet clusters with proximity to special events, traffic incidents, and hazard reports show that social media can serve as a proxy indicator of collective mobility events and predict short-term traffic during unplanned events.

    • Space–time series clustering: Algorithms, taxonomy, and case study on urban smart cities

      2020, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      The proposed approach can be used for real-time prospective surveillance context such as urgent clinical events with public health importance. Steiger et al. (2016a) used a geographic self-organizing map to group human mobility patterns by analyzing similar space–time series generated from live traffic feeds. A standard self-organizing map is first applied in order to observe and analyze the general topological relationships of the reference database.

    View all citing articles on Scopus

    This article belongs to the Virtual Special Issue on “Data-driven smart-city-enabled traffic system modeling, analysis, and optimization”.

    View full text