Social-SSL: Self-supervised Cross-Sequence Representation Learning Based on Transformers for Multi-agent Trajectory Prediction

Tsao, Li-Wu; Wang, Yan-Kai; Lin, Hao-Siang; Shuai, Hong-Han; Wong, Lai-Kuan; Cheng, Wen-Huang

doi:10.1007/978-3-031-20047-2_14

Social-SSL: Self-supervised Cross-Sequence Representation Learning Based on Transformers for Multi-agent Trajectory Prediction

Li-Wu Tsao¹²,
Yan-Kai Wang¹²,
Hao-Siang Lin¹²,
Hong-Han Shuai¹²,
Lai-Kuan Wong¹³ &
…
Wen-Huang Cheng¹²

Conference paper
First Online: 23 October 2022

3123 Accesses
6 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13682))

Abstract

Earlier trajectory prediction approaches focus on ways of capturing sequential structures among pedestrians by using recurrent networks, which is known to have some limitations in capturing long sequence structures. To address this limitation, some recent works proposed Transformer-based architectures, which are built with attention mechanisms. However, these Transformer-based networks are trained end-to-end without capitalizing on the value of pre-training. In this work, we propose Social-SSL that captures cross-sequence trajectory structures via self-supervised pre-training, which plays a crucial role in improving both data efficiency and generalizability of Transformer networks for trajectory prediction. Specifically, Social-SSL models the interaction and motion patterns with three pretext tasks: interaction type prediction, closeness prediction, and masked cross-sequence to sequence pre-training. Comprehensive experiments show that Social-SSL outperforms the state-of-the-art methods by at least 12% and 20% on ETH/UCY and SDD datasets in terms of Average Displacement Error and Final Displacement Error (code available at https://github.com/Sigta678/Social-SSL.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
In the experiments, r is set to half of the input length empirically (In Appendix).

References

Alahi, A., et al.: Social LSTM: human trajectory prediction in crowded spaces. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 961–971 (2016)
Google Scholar
Amirian, J., et al.: Social ways: learning multi-modal distributions of pedestrian trajectories with GANs. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 2964–2972 (2019)
Google Scholar
Choi, C., Choi, J.H., Li, J., Malla, S.: Shared cross-modal trajectory prediction for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 244–253 (2021)
Google Scholar
Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Volume 1 (Long and Short Papers), pp. 4171–4186 (2019)
Google Scholar
Dong, L., et al.: Unified language model pre-training for natural language understanding and generation. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems (NeurIPS), pp. 13063–13075 (2019)
Google Scholar
Giuliari, F., Hasan, I., Cristani, M., Galasso, F.: Transformer networks for trajectory forecasting. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 10335–10342. IEEE (2021)
Google Scholar
Gupta, A., et al.: Social GAN: socially acceptable trajectories with generative adversarial networks. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2255–2264 (2018)
Google Scholar
Helbing, D., et al.: Social force model for pedestrian dynamics. Phys. Rev. E 51(5), 4282 (1995)
Article Google Scholar
Hochreiter, S., et al.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Huang, Y., et al.: STGAT: modeling spatial-temporal interactions for human trajectory prediction. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6271–6280 (2019)
Google Scholar
Kalman, R.E.: A new approach to linear filtering and prediction problems. Trans. ASME-J. Basic Eng. 82(Series D), 35–45 (1960)
Google Scholar
Kipf, T.N., et al.: Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations (ICLR) (2017)
Google Scholar
Kitani, K.M., Ziebart, B.D., Bagnell, J.A., Hebert, M.: Activity forecasting. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 201–214. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33765-9_15
Chapter Google Scholar
Kosaraju, V., et al.: Social-BIGAT: multimodal trajectory forecasting using bicycle-GAN and graph attention networks. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 137–146 (2019)
Google Scholar
Kuderer, M., et al.: Feature-based prediction of trajectories for socially compliant navigation. In: Robotics: Science and Systems (2012)
Google Scholar
Lerner, A., et al.: Crowds by example. Comput. Graph. Forum. 26, 655–664. Wiley Online Library (2007)
Google Scholar
Li, J., et al.: EvolveGraph: multi-agent trajectory prediction with dynamic relational reasoning. In: 34th Conference on Advances in Neural Information Processing Systems (NeurIPS) (2020)
Google Scholar
Li, L.L., et al.: End-to-end contextual perception and prediction with interaction transformer. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5784–5791 (2020)
Google Scholar
Liang, J., et al.: Peeking into the future: predicting future person activities and locations in videos. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5725–5734 (2019)
Google Scholar
Liang, J., Jiang, L., Hauptmann, A.: SimAug: learning robust representations from simulation for trajectory prediction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12358, pp. 275–292. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58601-0_17
Chapter Google Scholar
Lipton, Z.C., et al.: A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019 (2015)
Loshchilov, I., et al.: Decoupled weight decay regularization. In: International Conference on Learning Representations (ICLR) (2019)
Google Scholar
Mangalam, K., et al.: It is not the journey but the destination: endpoint conditioned trajectory prediction. In: European Conference on Computer Vision (ECCV), pp. 759–776 (2020)
Google Scholar
Møgelmose, A., et al.: Trajectory analysis and prediction for improved pedestrian safety: integrated framework and evaluations. In: IEEE Intelligent Vehicles Symposium (IV), pp. 330–335. IEEE (2015)
Google Scholar
Mohamed, A., et al.: Social-STGCNN: a social spatio-temporal graph convolutional neural network for human trajectory prediction. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14424–14432 (2020)
Google Scholar
Mohseni, S., et al.: Self-supervised learning for generalizable out-of-distribution detection. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI).,vol. 34, pp. 5216–5223 (2020)
Google Scholar
Mohsenvand, M.N., et al.: Contrastive representation learning for electroencephalogram classification. In: Proceedings of Machine Learning for Health, vol. 136, pp. 238–253. (PMLR) (2020)
Google Scholar
Monti, A., et al.: Dag-Net: double attentive graph neural network for trajectory forecasting. In: 25th International Conference on Pattern Recognition (ICPR) (2020)
Google Scholar
Pang, B., et al.: Trajectory prediction with latent belief energy-based model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11814–11824 (2021)
Google Scholar
Park, S.H., et al.: Diverse and admissible trajectory forecasting through multimodal context understanding. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 282–298. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_17
Chapter Google Scholar
Pellegrini, S., et al.: You’ll never walk alone: Modeling social behavior for multi-target tracking. In: IEEE/CVF 12th International Conference on Computer Vision (ICCV), pp. 261–268 (2009)
Google Scholar
Robicquet, A., Sadeghian, A., Alahi, A., Savarese, S.: Learning social etiquette: human trajectory understanding in crowded scenes. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 549–565. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_33
Chapter Google Scholar
Sadeghian, A., et al.: Sophie: an attentive GAN For predicting paths compliant to social and physical constraints. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1349–1358 (2019)
Google Scholar
Sadeghian, A., et al.: Car-Net: clairvoyant attentive recurrent network. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 151–167 (2018)
Google Scholar
Salzmann, T., Ivanovic, B., Chakravarty, P., Pavone, M.: Trajectron++: dynamically-feasible trajectory forecasting with heterogeneous data. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12363, pp. 683–700. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58523-5_40
Chapter Google Scholar
Shi, L., et al.: Social interpretable tree for pedestrian trajectory prediction. In: AAAI Conference on Artificial Intelligence (AAAI) (2022)
Google Scholar
Shi, L., et al.: SGCN: sparse graph convolution network for pedestrian trajectory prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8994–9003 (2021)
Google Scholar
Shi, X., et al.: Social DPF: socially acceptable distribution prediction of futures. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), vol. 35, pp. 2550–2557 (2021)
Google Scholar
Song, H., Ding, W., Chen, Y., Shen, S., Wang, M.Y., Chen, Q.: PiP: planning-informed trajectory prediction for autonomous driving. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12366, pp. 598–614. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58589-1_36
Chapter Google Scholar
Song, K., et al.: Mass: Masked sequence to sequence pre-training for language generation. In: International Conference on Machine Learning (ICML), pp. 5926–5936 (2019)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: 31st Conference on Advances Neural Information Processing Systems (NIPS 2017), pp. 5998–6008 (2017)
Google Scholar
Wang, J.M., et al.: Gaussian process dynamical models for human motion. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 283–298 (2008)
Article Google Scholar
Yu, C., Ma, X., Ren, J., Zhao, H., Yi, S.: Spatio-temporal graph transformer networks for pedestrian trajectory prediction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 507–523. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_30
Chapter Google Scholar
Yuan, Y., Weng, X., Ou, Y., Kitani, K.M.: AgentFormer: agent-aware transformers for socio-temporal multi-agent forecasting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9813–9823 (2021)
Google Scholar

Download references

Acknowledgement

This work was supported in part by Ministry of Science and Technology of Taiwan under the grant numbers: MOST-109-2221-E-009-114-MY3, MOST-110-2221-E-A49-164, MOST-109-2223-E-009-002-MY3, MOST-110-2218-E-A49-018 and MOST-111-2634-F-007-002, as well as the partial support from QUALCOMM TAIWAN UNIVERSTIV RESEARCH 2021 PROGRAM (NYCU). We are grateful to the National Center for High-performance Computing for computer time and facilities.

Author information

Authors and Affiliations

National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Li-Wu Tsao, Yan-Kai Wang, Hao-Siang Lin, Hong-Han Shuai & Wen-Huang Cheng
Multimedia University, Cyberjaya, Malaysia
Lai-Kuan Wong

Authors

Li-Wu Tsao
View author publications
You can also search for this author in PubMed Google Scholar
Yan-Kai Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hao-Siang Lin
View author publications
You can also search for this author in PubMed Google Scholar
Hong-Han Shuai
View author publications
You can also search for this author in PubMed Google Scholar
Lai-Kuan Wong
View author publications
You can also search for this author in PubMed Google Scholar
Wen-Huang Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li-Wu Tsao .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 15457 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tsao, LW., Wang, YK., Lin, HS., Shuai, HH., Wong, LK., Cheng, WH. (2022). Social-SSL: Self-supervised Cross-Sequence Representation Learning Based on Transformers for Multi-agent Trajectory Prediction. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13682. Springer, Cham. https://doi.org/10.1007/978-3-031-20047-2_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-20047-2_14
Published: 23 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20046-5
Online ISBN: 978-3-031-20047-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics