Abstract
Continuous top-k trajectory similarity Search (CkSearch) is now commonly required in real-time large-scale trajectory analysis, enabling the distributed stream processing engines to discover various dynamic patterns. As a fundamental operator, CkSearch empowers various applications, e.g., contact tracing during an outbreak and smart transportation. Although extensive efforts have been made to improve the efficiency of non-continuous top-k search, they do not consider dynamic capability of indexing (R1) and incremental capability of computing (R2). Therefore, in this paper, we propose a generic CkSearch-oriented framework for distributed real-time trajectory stream processing on Apache Flink, termed as Garden. To answer R1, we design a sophisticated distributed dynamic spatial index called Y-index, which consists of a real-time load scheduler and a two-layer indexing structure. To answer R2, we introduce a state reusing mechanism and index-based pruning methods that significantly reduce the computational cost. Empirical studies on real-world data validate the usefulness of our proposal and prove the huge advantage of our approach over state-of-the-art solutions in the literature.
Similar content being viewed by others
References
Amato G, Savino P (2008) Approximate similarity search in metric spaces using inverted files. In: Lempel R, Perego R, Silvestri F (eds) 3rd International ICST conference on scalable information systems, INFOSCALE 2008, Vico Equense, Italy, June 4–6, 2008, p 28. https://doi.org/10.4108/ICST.INFOSCALE2008.3486
Beckmann N, Kriegel H, Schneider R et al (1990) The r*-tree: an efficient and robust access method for points and rectangles. In: Garcia-Molina H, Jagadish HV (eds) Proceedings of the 1990 ACM SIGMOD international conference on management of data, Atlantic City, NJ, USA, May 23–25, 1990, pp 322–331. https://doi.org/10.1145/93597.98741
Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517. https://doi.org/10.1145/361002.361007
Cai R, Lu Z, Wang L et al (2017) Ditir: distributed index for high throughput trajectory insertion and real-time temporal range query. Proc VLDB Endow 10(12):1865–1868. https://doi.org/10.14778/3137765.3137795
Chen L, Ng RT (2004) On the marriage of lp-norms and edit distance. In: Nascimento MA, Özsu MT, Kossmann D et al (eds) (e)Proceedings of the thirtieth international conference on very large data bases, VLDB 2004, Toronto, Canada, August 31–September 3 2004, pp 792–803. https://doi.org/10.1016/B978-012088469-8.50070-X
Chen L, Özsu MT, Oria V (2005) Robust and fast similarity search for moving object trajectories. In: Özcan F (ed) Proceedings of the ACM SIGMOD international conference on management of data, Baltimore, Maryland, USA, June 14–16, 2005, pp 491–502. https://doi.org/10.1145/1066157.1066213
Fang J, Zhao P, Liu A et al (2019) Scalable and adaptive joins for trajectory data in distributed stream system. JCST 34(4):747–761. https://doi.org/10.1007/s11390-019-1940-x
Fazzinga B, Flesca S, Furfaro F et al (2014) Cleaning trajectory data of rfid-monitored objects through conditioning under integrity constraints. In: EDBT, pp 379–390
Finkel RA, Bentley JL (1974) Quad trees: a data structure for retrieval on composite keys. Acta Inform 4:1–9. https://doi.org/10.1007/BF00288933
Frentzos E, Gratsias K, Theodoridis Y (2007) Index-based most similar trajectory search. In: Chirkova R, Dogac A, Özsu MT et al (eds) Proceedings of the 23rd international conference on data engineering (ICDE 2007), The Marmara Hotel, Istanbul, Turkey, April 15–20, 2007, pp 816–825. https://doi.org/10.1109/ICDE.2007.367927
Fu AW, Chan PM, Cheung Y et al (2000) Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances. VLDB J 9(2):154–173. https://doi.org/10.1007/PL00010672
Fu YC, Hu ZY, Guo W et al (2003) Qr-tree: a hybrid spatial index structure. In: Proceedings of the 2003 international conference on machine learning and cybernetics (IEEE Cat. No.03EX693) vol 1, pp 459–463. https://doi.org/10.1109/ICMLC.2003.1264521
Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: Yormark B (ed) SIGMOD’84, proceedings of annual meeting, Boston, Massachusetts, USA, June 18–21, 1984, pp 47–57. https://doi.org/10.1145/602259.602266
Jeung H, Lu H, Sathe S et al (2014) Managing evolving uncertainty in trajectory databases. IEEE Trans Knowl Data Eng 26(7):1692–1705. https://doi.org/10.1109/TKDE.2013.141
Kamel I, Faloutsos C (1994) Hilbert r-tree: An improved r-tree using fractals. In: Bocca JB, Jarke M, Zaniolo C (eds) Proceedings of 20th international conference on very large data bases (VLDB’94), September 12–15, 1994, Santiago de Chile, Chile, pp 500–509. http://www.vldb.org/conf/1994/P500.PDF
Leutenegger ST, Edgington JM, López MA (1997) STR: A simple and efficient algorithm for r-tree packing. In: Gray WA, Larson P (eds) Proceedings of the thirteenth international conference on data engineering, April 7–11, 1997, Birmingham, UK, pp 497–506. https://doi.org/10.1109/ICDE.1997.582015
Li X, Zhao K, Cong G et al (2018a) Deep representation learning for trajectory similarity computation. In: 2018 IEEE 34th international conference on data engineering (ICDE), pp 617–628. https://doi.org/10.1109/ICDE.2018.00062
Li X, Zhao K, Cong G et al (2018b) Deep representation learning for trajectory similarity computation. In: 34th IEEE international conference on data engineering (ICDE 2018), pp 617–628. https://doi.org/10.1109/ICDE.2018.00062
Ma C, Lu H, Shou L et al (2013) KSQ: top-(k) similarity query on uncertain trajectories. IEEE Trans Knowl Data Eng 25(9):2049–2062. https://doi.org/10.1109/TKDE.2012.152
Nutanong S, Jacox EH, Samet H (2011) An incremental Hausdorff distance calculation algorithm. Proc VLDB Endow 4(8):506–517. https://doi.org/10.14778/2002974.2002978
Ranu S, Deepak P, Telang AD et al (2015) Indexing and matching trajectories under inconsistent sampling rates. In: 2015 IEEE 31st international conference on data engineering. IEEE, pp 999–1010. https://doi.org/10.1109/ICDE.2015.7113351
Shang Z, Li G, Bao Z (2018) Dita: distributed in-memory trajectory analytics. In: Proceedings of the 2018 international conference on management of data (SIGMOD’18). Association for Computing Machinery, New York, NY, USA, pp 725–740. https://doi.org/10.1145/3183713.3183743
Su H, Liu S, Zheng B et al (2020) A survey of trajectory distance measures and performance evaluation. VLDB J 29(1):3–32. https://doi.org/10.1007/s00778-019-00574-9
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Ghahramani Z, Welling M, Cortes C et al (eds) Advances in neural information processing systems 27: annual conference on neural information processing systems 2014, December 8–13 2014, Montreal, Quebec, Canada, pp 3104–3112. https://proceedings.neurips.cc/paper/2014/hash/a14ac55a4f27472c5d894ec1c3c743d2-Abstract.html
Xie D, Li F, Yao B et al (2016) Simba: Efficient in-memory spatial analytics. In: Proceedings of the 2016 international conference on management of data (SIGMOD’16). Association for Computing Machinery, New York, NY, USA, pp 1071–1085. https://doi.org/10.1145/2882903.2915237
Xie D, Phillips JM (2017) Distributed trajectory similarity search. Proc VLDB Endow 10(11):1478–1489. https://doi.org/10.14778/3137628.3137655
Yao D, Cong G, Zhang C et al (2019) Computing trajectory similarity in linear time: a generic seed-guided neural metric learning approach. In: 35th IEEE international conference on data engineering (ICDE 2019), Macao, China, April 8–11, 2019, pp 1358–1369. https://doi.org/10.1109/ICDE.2019.00123
Yi B, Jagadish HV, Faloutsos C (1998) Efficient retrieval of similar time sequences under time warping. In: Urban SD, Bertino E (eds) Proceedings of the fourteenth international conference on data engineering, Orlando, Florida, USA, February 23–27, 1998, pp 201–208. https://doi.org/10.1109/ICDE.1998.655778
Yuan H, Li G (2019) Distributed in-memory trajectory similarity search and join on road network. In: ICDE, pp 1262–1273. https://doi.org/10.1109/ICDE.2019.00115
Zäschke T, Zimmerli C, Norrie MC (2014) The ph-tree: A space-efficient storage structure and multi-dimensional index. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data (SIGMOD’14). Association for Computing Machinery, New York, NY, USA, pp 397–408. https://doi.org/10.1145/2588555.2588564
Zhang J, Tang B, Yiu ML (2019) Fast trajectory range query with discrete Frechet distance. In: Advances in database technology—22nd international conference on extending database technology (EDBT 2019), Lisbon, Portugal, March 26–29, 2019, pp 634–637. https://doi.org/10.5441/002/edbt.2019.74
Zheng B, Weng L, Zhao X et al (2021) Repose: distributed top-k trajectory similarity search with local reference point tries. In: 2021 IEEE 37th international conference on data engineering (ICDE), pp 708–719. https://doi.org/10.1109/ICDE51399.2021.00067
Zhong RY, Huang GQ, Lan S et al (2015) A big data approach for logistics trajectory discovery from rfid-enabled production data. Int J Prod Econ 165:260–272
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grant (Nos. 61802273, 62102277), Postdoctoral Science Foundation of China (No. 2020M681529), Natural Science Foundation of Jiangsu Province (BK20210703), China Science and Technology Plan Project of Suzhou (No. SYG202139), and Postgraduate Research & Practice Innovation Program of Jiangsu Province (SJCX2_11342, KYCX22_3197).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Pan, Z., Chao, P., Fang, J. et al. Garden: a real-time processing framework for continuous top-k trajectory similarity search. Knowl Inf Syst 65, 3777–3805 (2023). https://doi.org/10.1007/s10115-023-01880-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-023-01880-z