Abstract
We present a framework for extracting spatiotemporal trip typologies using noisy mobile ticketing boarding data sampled from passengers in a bus network. Our case study was the Pioneer Valley Transit Authority in Massachusetts. We first used a greedy approach to infer bus boarding stops. Next, we calculated the multi-dimensional dissimilarity of passenger activation time series using the AWarp alignment algorithm for sparse time series. We then employed hierarchical clustering to discover the spatiotemporal patterns, resulting in four distinct trip pattern typologies. We analyzed the typologies, based on trip length and duration, seasonality and other temporal distributions, spatial distributions, and faretype. Three typologies were linked to regular commuters, distinguished by boarding time or transfer tendency. The fourth typology was primarily associated with leisure or other activities. Our typology method provides valuable passenger behavioral insights and can facilitate demand estimation by planners. Further, we demonstrate a potential for decision-making support for other regional transit authorities with limited passenger data availability.
Similar content being viewed by others
Data Availability
The data used in the analyses were provided by the Pioneer Valley Transit Authority (PVTA). The data used in this study are not publicly available due to privacy concerns as the data contain sensitive personal information that cannot be shared without the explicit consent of the individuals involved. The code and documentation for generating the results presented in this paper are available via this GitHub repository.
References
Agard B, Nia VP, Trépanier M (2013) Assessing public transport travel behaviour from smart card data with advanced data mining techniques. In: World Conference on Transport Research 13:13
Ait-Ali A, Eliasson J (2019) Dynamic origin-destination estimation using smart card data: an entropy maximisation approach. arXiv e-prints
Alsger AA, Mesbah M, Ferreira L et al (2015) Use of smart card fare data to estimate public transport origin-destination matrix. Transp Res Record 2535(1):88–96. https://doi.org/10.3141/2535-10
Asadi R, Regan A (2019) Spatio-temporal clustering of traffic data with deep embedded clustering. In: Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Prediction of Human Mobility—PredictGIS’19. ACM Press, Chicago, pp 45–52, https://doi.org/10.1145/3356995.3364537
Asif MT, Dauwels J, Goh CY et al (2014) Spatiotemporal patterns in large-scale traffic speed prediction. IEEE Trans Intell Transp Syst 15(2):794–804. https://doi.org/10.1109/TITS.2013.2290285
Ben-Akiva ME, Morikawa T (1989) Data fusion methods and their applications to origin-destination trip tables. In: Transport Policy, Management & Technology towards 2001: Selected Proceedings of the Fifth World Conference on Transport Research, pp 279–293
Briand AS, Côme E, Trépanier M et al (2017) Analyzing year-to-year changes in public transport passenger behaviour using smart card data. Transp Res Part C: Emerg Technol 79:274–289. https://doi.org/10.1016/j.trc.2017.03.021
Caliński T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat 3(1):1–27. https://doi.org/10.1080/03610927408827101
Chen R, Zhang J, Ravishanker N et al (2019) Clustering activity-travel behavior time series using topological data analysis. J Big Data Anal Transp 1(2):109–121. https://doi.org/10.1007/s42421-019-00008-6
Chen E, Ye Z, Wang C et al (2020) Subway passenger flow prediction for special events using smart card data. IEEE Trans Intell Transp Syst 21(3):1109–1120. https://doi.org/10.1109/TITS.2019.2902405
Costa MA, Marra AD, Corman F (2023) Public Transport Commuting Analytics: A Longitudinal Study Based on GPS Tracking and Unsupervised Learning. Data Sci Trans 5(3). https://doi.org/10.1007/s42421-023-00077-810.1007/s42421-023-00077-8
Cournapeau D (2007) Scikit-learn: machine learning in Python. https://scikit-learn.org/stable/
Cui A (2006) Bus passenger origin-destination matrix estimation using automated data collection systems. Thesis, Massachusetts Institute of Technology
Decouvelaere R, Trépanier M, Agard B (2022) Modulated spatiotemporal clustering of smart card users. Public Transport. https://doi.org/10.1007/s12469-022-00305-4
El Mahrsi MK, Côme E, Oukhellou L et al (2017) Clustering smart card data for urban mobility analysis. IEEE Trans Intell Transp Syst 18(3):712–728. https://doi.org/10.1109/TITS.2016.2600515
Ester M, Kriegel HP, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96(34):226–231
Ge Q, Fukuda D (2016) Updating origin–destination matrices with aggregated data of GPS traces. Transp Res Part C: Emerg Technol 69:291–312. https://doi.org/10.1016/j.trc.2016.06.002
Giannotti F, Nanni M, Pinelli F, et al (2007) Trajectory pattern mining. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, KDD ’07, pp 330–339, https://doi.org/10.1145/1281192.1281230
Gordon JB (2012) Intermodal passenger flows on London’s public transport network: automated inference of full passenger journeys using fare-transaction and vehicle-location data. Thesis, Massachusetts Institute of Technology
Hanson S, Huff J (1986) Classification issues in the analysis of complex travel behavior. Transportation 13(3):271–293. https://doi.org/10.1007/BF00148620
Hanson S, Huff OJ (1988) Systematic variability in repetitious travel. Transportation 15(1):111–135. https://doi.org/10.1007/BF00167983
Hazelton ML (2010) Statistical inference for transit system origin-destination matrices. Technometrics 52(2):221–230. https://doi.org/10.1198/TECH.2010.09021
He L, Agard B, Trépanier M (2020) A classification of public transit users with smart card data based on time series distance metrics and a hierarchical clustering method. Transp A: Transport Sci 16(1):56–75. https://doi.org/10.1080/23249935.2018.1479722
Hochmair HH (2016) Spatiotemporal pattern analysis of taxi trips in New York City. Transp Res Record 2542(1):45–56. https://doi.org/10.3141/2542-06
Hwang Y, Gelfand SB (2018) Constrained sparse dynamic time warping. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pp 216–222, https://doi.org/10.1109/ICMLA.2018.00039
Inmook LEE (2019) Estimating of bus-trip destinations using temporal travel patterns of smart card data. Thesis, Seoul National University
Jones P, Clarke M (1988) The significance and measurement of variability in travel behaviour. Transportation 15(1):65–87. https://doi.org/10.1007/BF00167981
Kahana D, Dickens M (2023) APTA POLICY BRIEF Transit Ridership. Tech. rep, APTA
Kisilevich S, Mansmann F, Nanni M et al (2010) Spatio-temporal clustering. In: Maimon O, Rokach L (eds) Data mining and knowledge discovery handbook. Springer, US, Boston, pp 855–874. https://doi.org/10.1007/978-0-387-09823-4_44
Liu L, Miller HJ, Scheff J (2020) The impacts of COVID-19 pandemic on public transit demand in the United States. PLoS One 15(11):e0242476. https://doi.org/10.1371/journal.pone.0242476
Liu X, Van Hentenryck P, Zhao X (2021) Optimization models for estimating transit network origin–destination flows with big transit data. J Big Data Anal Trans 3(3):247–262. https://doi.org/10.1007/s42421-021-00050-3
Ma X, Wu YJ, Wang Y et al (2013) Mining smart card data for transit riders’ travel patterns. Transp Res Part C: Emerg Technol 36:1–12. https://doi.org/10.1016/j.trc.2013.07.010
Manley E, Zhong C, Batty M (2018) Spatiotemporal variation in travel regularity through transit user profiling. Transportation 45(3):703–732. https://doi.org/10.1007/s11116-016-9747-x
Mohammed M, Oke J (2023) Origin-destination inference in public transportation systems: A comprehensive review. Int J Trans Sci Technol 12(1):315–328. https://doi.org/10.1016/j.ijtst.2022.03.002
Mueen A, Chavoshi N, Abu-El-Rub N et al (2018) Speeding up dynamic time warping distance for sparse time series data. Knowl Inform Syst 54(1):237–263. https://doi.org/10.1007/s10115-017-1119-0
Navick D, Furth P (1994) Distance-based model for estimating a bus route origin-destination matrix. Transportation research record, p 16
Nishiuchi H, King J, Todoroki T (2013) Spatial-temporal daily frequent trip pattern of public transport passengers using smart card data. Int J Intell Transp Syst Res 11(1):1–10. https://doi.org/10.1007/s13177-012-0051-7
O’Toole R (2018) Charting public transit’s decline. https://www.cato.org/policy-analysis/charting-public-transits-decline
Pas EI (1987) Intrapersonal variability and model goodness-of-fit. Transp Res Part A: Gen. https://doi.org/10.1016/0191-2607(87)90032-X
Pas EI, Koppelman FS (1986) An examination of the determinants of day-to-day variability in individuals’ urban travel behavior. Transportation 13(2):183–200. https://doi.org/10.1007/BF00165547
Prasannakumar V, Vijith H, Charutha R et al (2011) Spatio-temporal clustering of road accidents: GIS based analysis and assessment. Procedia Soc Behav Sci 21:317–325. https://doi.org/10.1016/j.sbspro.2011.07.020
Primerano F, Taylor MAP, Pitaksringkarn L et al (2008) Defining and understanding trip chaining behaviour. Transportation 35(1):55–72. https://doi.org/10.1007/s11116-007-9134-8
PVTA (2023) About PVTA. http://www.pvta.com/about.php
Rinzivillo S, Pedreschi D, Nanni M et al (2008) Visually-driven analysis of movement data by progressive clustering. Inform Vis 7:225–239. https://doi.org/10.1057/palgrave.ivs.9500183
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65. https://doi.org/10.1016/0377-0427(87)90125-7
Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26(1):43–49. https://doi.org/10.1109/TASSP.1978.1163055
Salvador S, Chan P (2007) Toward accurate dynamic time warping in linear time and space. Intell Data Anal 11(5):561–580. https://doi.org/10.3233/IDA-2007-11508
Sanaullah I, Alsaleh N, Djavadian S et al (2021) Spatio-temporal analysis of on-demand transit: a case study of Belleville, Canada. Transp Res Part A: Policy Pract 145:284–301. https://doi.org/10.1016/j.tra.2021.01.020
Shao F, Sui Y, Yu X et al (2019) Spatio-temporal travel patterns of elderly people—a comparative study based on buses usage in Qingdao, China. J Transport Geogr 76:178–190. https://doi.org/10.1016/j.jtrangeo.2019.04.001
Shen D, Chi M (2021) TC-DTW: accelerating multivariate dynamic time warping through triangle inequality and point clustering. https://doi.org/10.48550/arXiv.2101.07731
Shi Z, Pun-Cheng LSC (2019) Spatiotemporal data clustering: a survey of methods. ISPRS Int J Geo-Inform 8(3):112. https://doi.org/10.3390/ijgi8030112
Shi Z, Pun-Cheng LSC, Liu X et al (2020) Analysis of the temporal characteristics of the elderly traveling by bus using smart card data. ISPRS Int J Geo-Inform 9(12):751. https://doi.org/10.3390/ijgi9120751
Shokoohi-Yekta M, Hu B, Jin H et al (2017) Generalizing DTW to the multi-dimensional case requires an adaptive approach. Data Min Knowl Discov 31(1):1–31. https://doi.org/10.1007/s10618-016-0455-0
Song J, Zhao C, Zhong S et al (2019) Mapping spatio-temporal patterns and detecting the factors of traffic congestion with multi-source data fusion and mining techniques. Comput Environ Urban Syst 77(101):364. https://doi.org/10.1016/j.compenvurbsys.2019.101364
Story R (2013) Folium. https://python-visualization.github.io/folium/
Strauss T, von Maltitz MJ (2017) Generalising ward’s method for use with Manhattan distances. PLoS One 12(1):e0168288. https://doi.org/10.1371/journal.pone.0168288
Sun Y, Xu R (2012) Rail transit travel time reliability and estimation of passenger route choice behavior: analysis using automatic fare collection data. Transp Res Record 2275(1):58–67. https://doi.org/10.3141/2275-07
Sun D, Zhang K, Shen S (2018) Analyzing spatiotemporal traffic line source emissions based on massive didi online car-hailing service data. Transp Res Part D: Transport Environ 62:699–714. https://doi.org/10.1016/j.trd.2018.04.024
Taylor J (2010) Statsmodels: statistical modeling and econometrics in Python. https://www.statsmodels.org/stable/index.html
Wang W (2010) Bus passenger origin-destination estimation and travel behavior using automated data collection systems in London. Thesis, Massachusetts Institute of Technology, UK
Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244. https://doi.org/10.1080/01621459.1963.10500845
Yong J, Zheng L, Mao X et al (2021) Mining metro commuting mobility patterns using massive smart card data. Phys A: Stat Mech Appl 584(126):351. https://doi.org/10.1016/j.physa.2021.126351
Zhang F (2022) Not all extreme weather events are equal: impacts on risk perception and adaptation in public transit agencies. Clim Change 171(1):3. https://doi.org/10.1007/s10584-022-03323-0
Zhang F, Welch EW, Miao Q (2018) Public organization adaptation to extreme events: mediating role of risk perception. J Public Admin Res Theory 28(3):371–387. https://doi.org/10.1093/jopart/muy004
Zhao J, Tian C, Zhang F, et al (2014) Understanding temporal and spatial travel patterns of individual passengers by mining smart card data. In: 17th International IEEE Conference on Intelligent Transportation Systems (ITSC), pp 2991–2997, https://doi.org/10.1109/ITSC.2014.6958170
Zhao J, Qu Q, Zhang F et al (2017) Spatio-temporal analysis of passenger travel patterns in massive smart card data. IEEE Trans Intell Transp Syst 18(11):3135–3146. https://doi.org/10.1109/TITS.2017.2679179
Zhong S, Sun DJ (2022) Analyzing spatiotemporal congestion pattern on urban roads based on taxi GPS data. In: Zhong S, Sun DJ (eds) Logic-driven traffic big data analytics: methodology and applications for planning. Springer Nature, Singapore, pp 97–118. https://doi.org/10.1007/978-981-16-8016-8_5
Zhong C, Manley E, Arisona SM et al (2015) Measuring variability of mobility patterns from multiday smart-card data. J Comput Sci 9:125–130. https://doi.org/10.1016/j.jocs.2015.04.021
Funding
The research leading to these results received funding from the Federal Transit Administration under Grant Agreement ID FAIN MA-2021-012-00.
Author information
Authors and Affiliations
Contributions
The authors confirm their contribution to the paper as follows: study conception and design: JO, MM; data collection: MM; analysis and interpretation of results: MM, JO; draft manuscript preparation: MM, JO. Both authors reviewed the results and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no competing interests.
Ethical Approval
This study did not require ethical approval as it did not involve any human or animal subjects.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Abdalazeem, M., Oke, J. Extracting Spatiotemporal Bus Passenger Trip Typologies from Noisy Mobile Ticketing Boarding Data. Data Sci. Transp. 5, 20 (2023). https://doi.org/10.1007/s42421-023-00082-x
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42421-023-00082-x