Multi-level Matching of Natural Language-Based Vehicle Retrieval

Liu, Ying; Zhang, Zhongshuai; Yang, Xiaochun

doi:10.1007/978-981-97-2387-4_24

Ying Liu¹²,
Zhongshuai Zhang¹² &
Xiaochun Yang¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14333))

Included in the following conference series:

Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data

100 Accesses

Abstract

Utilizing natural language to retrieve vehicles of specific types and motion states in videos holds great significance for analyzing traffic conditions. But natural language and vehicle video contain rich semantics, including static and dynamic information about vehicles. Additionally, the flexibility of natural language allows for multiple expressions of sentences with identical semantics. To make full use of the information in it, we divide the natural language and video data into different levels and divide them into the representation of overall and local information. We propose information enhancement methods for different data levels, followed by generating embedded representations for layered data using representation learning networks. Finally, the overall cross-modal similarity is calculated by applying weighted measures. Experimental results demonstrate the method’s capability to enhance the accuracy of retrieving vehicles in specific states from videos using natural language.

The work is partially supported by the National Natural Science Foundation of China (Nos. U22A2025, 62072088, 62232007), Ten Thousand Talent Program (No. ZX20200035), Liaoning Distinguished Professor (No. XLYC1902057), and 111 Project (B16009).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://spacy.io/.

References

Bai, S., et al.: Connecting language and vision for natural language-based vehicle retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4034–4043 (2021)
Google Scholar
Bastani, F., et al.: MIRIS: fast object track queries in video. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 1907–1921 (2020)
Google Scholar
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
Feng, Q., Ablavsky, V., Sclaroff, S.: CityFlow-NL: tracking and retrieval of vehicles at city scale by natural language descriptions. arXiv preprint arXiv:2101.04741 (2021)
Gao, G., Shao, H., Wu, F., Yang, M., Yu, Y.: Leaning compact and representative features for cross-modality person re-identification. World Wide Web 25(4), 1649–1666 (2022)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hui, T., et al.: Collaborative spatial-temporal modeling for language-queried video actor segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4187–4196 (2021)
Google Scholar
Kang, D., Emmons, J., Abuzaid, F., Bailis, P., Zaharia, M.: NoScope: optimizing neural network queries over video at scale. arXiv preprint arXiv:1703.02529 (2017)
Kang, D., Guibas, J., Bailis, P.D., Hashimoto, T., Zaharia, M.: TASTI: semantic indexes for machine learning-based queries over unstructured data. In: Proceedings of the 2022 International Conference on Management of Data, pp. 1934–1947 (2022)
Google Scholar
Mai, S., Hu, H., Xing, S.: Modality to modality translation: an adversarial representation learning and graph fusion network for multimodal fusion. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 164–172 (2020)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Sun, Z., Liu, X., Bi, X., Nie, X., Yin, Y.: DUN: dual-path temporal matching network for natural language-based vehicle retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4061–4067 (2021)
Google Scholar
Wang, F., Xu, J., Liu, C., Zhou, R., Zhao, P.: On prediction of traffic flows in smart cities: a multitask deep learning based approach. World Wide Web 24, 805–823 (2021)
Article Google Scholar
Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 3645–3649. IEEE (2017)
Google Scholar
Zhang, J., et al.: A multi-granularity retrieval system for natural language-based vehicle retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3216–3225 (2022)
Google Scholar
Zhang, P.F., Luo, Y., Huang, Z., Xu, X.S., Song, J.: High-order nonlocal hashing for unsupervised cross-modal retrieval. World Wide Web 24, 563–583 (2021)
Article Google Scholar
Zhao, C., et al.: Symmetric network with spatial relationship modeling for natural language-based vehicle retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3226–3233 (2022)
Google Scholar
Zhong, Z., Zheng, L., Cao, D., Li, S.: Re-ranking person re-identification with k-reciprocal encoding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1318–1327 (2017)
Google Scholar
Zhu, X., Luo, Z., Fu, P., Ji, X.: VOC-ReID: vehicle re-identification based on vehicle-orientation-camera. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 602–603 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Beijing Institute of Technology, Beijing, China
Ying Liu & Zhongshuai Zhang
Northeastern University, Liaoning, China
Xiaochun Yang

Authors

Ying Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhongshuai Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaochun Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ying Liu .

Editor information

Editors and Affiliations

Peng Cheng Laboratory, Shenzhen, China
Xiangyu Song
China University of Geosciences, Wuhan, China
Ruyi Feng
China University of Geosciences, Wuhan, China
Yunliang Chen
Deakin University, Burwood, VIC, Australia
Jianxin Li
University of Exeter, Exeter, UK
Geyong Min

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Y., Zhang, Z., Yang, X. (2024). Multi-level Matching of Natural Language-Based Vehicle Retrieval. In: Song, X., Feng, R., Chen, Y., Li, J., Min, G. (eds) Web and Big Data. APWeb-WAIM 2023. Lecture Notes in Computer Science, vol 14333. Springer, Singapore. https://doi.org/10.1007/978-981-97-2387-4_24

Download citation

DOI: https://doi.org/10.1007/978-981-97-2387-4_24
Published: 28 April 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2386-7
Online ISBN: 978-981-97-2387-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multi-level Matching of Natural Language-Based Vehicle Retrieval