Skip to main content

Social-SSL: Self-supervised Cross-Sequence Representation Learning Based on Transformers for Multi-agent Trajectory Prediction

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13682))

Abstract

Earlier trajectory prediction approaches focus on ways of capturing sequential structures among pedestrians by using recurrent networks, which is known to have some limitations in capturing long sequence structures. To address this limitation, some recent works proposed Transformer-based architectures, which are built with attention mechanisms. However, these Transformer-based networks are trained end-to-end without capitalizing on the value of pre-training. In this work, we propose Social-SSL that captures cross-sequence trajectory structures via self-supervised pre-training, which plays a crucial role in improving both data efficiency and generalizability of Transformer networks for trajectory prediction. Specifically, Social-SSL models the interaction and motion patterns with three pretext tasks: interaction type prediction, closeness prediction, and masked cross-sequence to sequence pre-training. Comprehensive experiments show that Social-SSL outperforms the state-of-the-art methods by at least 12% and 20% on ETH/UCY and SDD datasets in terms of Average Displacement Error and Final Displacement Error (code available at https://github.com/Sigta678/Social-SSL.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    In the experiments, r is set to half of the input length empirically (In Appendix).

References

  1. Alahi, A., et al.: Social LSTM: human trajectory prediction in crowded spaces. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 961–971 (2016)

    Google Scholar 

  2. Amirian, J., et al.: Social ways: learning multi-modal distributions of pedestrian trajectories with GANs. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 2964–2972 (2019)

    Google Scholar 

  3. Choi, C., Choi, J.H., Li, J., Malla, S.: Shared cross-modal trajectory prediction for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 244–253 (2021)

    Google Scholar 

  4. Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Volume 1 (Long and Short Papers), pp. 4171–4186 (2019)

    Google Scholar 

  5. Dong, L., et al.: Unified language model pre-training for natural language understanding and generation. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems (NeurIPS), pp. 13063–13075 (2019)

    Google Scholar 

  6. Giuliari, F., Hasan, I., Cristani, M., Galasso, F.: Transformer networks for trajectory forecasting. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 10335–10342. IEEE (2021)

    Google Scholar 

  7. Gupta, A., et al.: Social GAN: socially acceptable trajectories with generative adversarial networks. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2255–2264 (2018)

    Google Scholar 

  8. Helbing, D., et al.: Social force model for pedestrian dynamics. Phys. Rev. E 51(5), 4282 (1995)

    Article  Google Scholar 

  9. Hochreiter, S., et al.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  10. Huang, Y., et al.: STGAT: modeling spatial-temporal interactions for human trajectory prediction. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6271–6280 (2019)

    Google Scholar 

  11. Kalman, R.E.: A new approach to linear filtering and prediction problems. Trans. ASME-J. Basic Eng. 82(Series D), 35–45 (1960)

    Google Scholar 

  12. Kipf, T.N., et al.: Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations (ICLR) (2017)

    Google Scholar 

  13. Kitani, K.M., Ziebart, B.D., Bagnell, J.A., Hebert, M.: Activity forecasting. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 201–214. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33765-9_15

    Chapter  Google Scholar 

  14. Kosaraju, V., et al.: Social-BIGAT: multimodal trajectory forecasting using bicycle-GAN and graph attention networks. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 137–146 (2019)

    Google Scholar 

  15. Kuderer, M., et al.: Feature-based prediction of trajectories for socially compliant navigation. In: Robotics: Science and Systems (2012)

    Google Scholar 

  16. Lerner, A., et al.: Crowds by example. Comput. Graph. Forum. 26, 655–664. Wiley Online Library (2007)

    Google Scholar 

  17. Li, J., et al.: EvolveGraph: multi-agent trajectory prediction with dynamic relational reasoning. In: 34th Conference on Advances in Neural Information Processing Systems (NeurIPS) (2020)

    Google Scholar 

  18. Li, L.L., et al.: End-to-end contextual perception and prediction with interaction transformer. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5784–5791 (2020)

    Google Scholar 

  19. Liang, J., et al.: Peeking into the future: predicting future person activities and locations in videos. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5725–5734 (2019)

    Google Scholar 

  20. Liang, J., Jiang, L., Hauptmann, A.: SimAug: learning robust representations from simulation for trajectory prediction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12358, pp. 275–292. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58601-0_17

    Chapter  Google Scholar 

  21. Lipton, Z.C., et al.: A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019 (2015)

  22. Loshchilov, I., et al.: Decoupled weight decay regularization. In: International Conference on Learning Representations (ICLR) (2019)

    Google Scholar 

  23. Mangalam, K., et al.: It is not the journey but the destination: endpoint conditioned trajectory prediction. In: European Conference on Computer Vision (ECCV), pp. 759–776 (2020)

    Google Scholar 

  24. Møgelmose, A., et al.: Trajectory analysis and prediction for improved pedestrian safety: integrated framework and evaluations. In: IEEE Intelligent Vehicles Symposium (IV), pp. 330–335. IEEE (2015)

    Google Scholar 

  25. Mohamed, A., et al.: Social-STGCNN: a social spatio-temporal graph convolutional neural network for human trajectory prediction. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14424–14432 (2020)

    Google Scholar 

  26. Mohseni, S., et al.: Self-supervised learning for generalizable out-of-distribution detection. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI).,vol. 34, pp. 5216–5223 (2020)

    Google Scholar 

  27. Mohsenvand, M.N., et al.: Contrastive representation learning for electroencephalogram classification. In: Proceedings of Machine Learning for Health, vol. 136, pp. 238–253. (PMLR) (2020)

    Google Scholar 

  28. Monti, A., et al.: Dag-Net: double attentive graph neural network for trajectory forecasting. In: 25th International Conference on Pattern Recognition (ICPR) (2020)

    Google Scholar 

  29. Pang, B., et al.: Trajectory prediction with latent belief energy-based model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11814–11824 (2021)

    Google Scholar 

  30. Park, S.H., et al.: Diverse and admissible trajectory forecasting through multimodal context understanding. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 282–298. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_17

    Chapter  Google Scholar 

  31. Pellegrini, S., et al.: You’ll never walk alone: Modeling social behavior for multi-target tracking. In: IEEE/CVF 12th International Conference on Computer Vision (ICCV), pp. 261–268 (2009)

    Google Scholar 

  32. Robicquet, A., Sadeghian, A., Alahi, A., Savarese, S.: Learning social etiquette: human trajectory understanding in crowded scenes. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 549–565. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_33

    Chapter  Google Scholar 

  33. Sadeghian, A., et al.: Sophie: an attentive GAN For predicting paths compliant to social and physical constraints. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1349–1358 (2019)

    Google Scholar 

  34. Sadeghian, A., et al.: Car-Net: clairvoyant attentive recurrent network. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 151–167 (2018)

    Google Scholar 

  35. Salzmann, T., Ivanovic, B., Chakravarty, P., Pavone, M.: Trajectron++: dynamically-feasible trajectory forecasting with heterogeneous data. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12363, pp. 683–700. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58523-5_40

    Chapter  Google Scholar 

  36. Shi, L., et al.: Social interpretable tree for pedestrian trajectory prediction. In: AAAI Conference on Artificial Intelligence (AAAI) (2022)

    Google Scholar 

  37. Shi, L., et al.: SGCN: sparse graph convolution network for pedestrian trajectory prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8994–9003 (2021)

    Google Scholar 

  38. Shi, X., et al.: Social DPF: socially acceptable distribution prediction of futures. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), vol. 35, pp. 2550–2557 (2021)

    Google Scholar 

  39. Song, H., Ding, W., Chen, Y., Shen, S., Wang, M.Y., Chen, Q.: PiP: planning-informed trajectory prediction for autonomous driving. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12366, pp. 598–614. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58589-1_36

    Chapter  Google Scholar 

  40. Song, K., et al.: Mass: Masked sequence to sequence pre-training for language generation. In: International Conference on Machine Learning (ICML), pp. 5926–5936 (2019)

    Google Scholar 

  41. Vaswani, A., et al.: Attention is all you need. In: 31st Conference on Advances Neural Information Processing Systems (NIPS 2017), pp. 5998–6008 (2017)

    Google Scholar 

  42. Wang, J.M., et al.: Gaussian process dynamical models for human motion. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 283–298 (2008)

    Article  Google Scholar 

  43. Yu, C., Ma, X., Ren, J., Zhao, H., Yi, S.: Spatio-temporal graph transformer networks for pedestrian trajectory prediction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 507–523. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_30

    Chapter  Google Scholar 

  44. Yuan, Y., Weng, X., Ou, Y., Kitani, K.M.: AgentFormer: agent-aware transformers for socio-temporal multi-agent forecasting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9813–9823 (2021)

    Google Scholar 

Download references

Acknowledgement

This work was supported in part by Ministry of Science and Technology of Taiwan under the grant numbers: MOST-109-2221-E-009-114-MY3, MOST-110-2221-E-A49-164, MOST-109-2223-E-009-002-MY3, MOST-110-2218-E-A49-018 and MOST-111-2634-F-007-002, as well as the partial support from QUALCOMM TAIWAN UNIVERSTIV RESEARCH 2021 PROGRAM (NYCU). We are grateful to the National Center for High-performance Computing for computer time and facilities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li-Wu Tsao .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 15457 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tsao, LW., Wang, YK., Lin, HS., Shuai, HH., Wong, LK., Cheng, WH. (2022). Social-SSL: Self-supervised Cross-Sequence Representation Learning Based on Transformers for Multi-agent Trajectory Prediction. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13682. Springer, Cham. https://doi.org/10.1007/978-3-031-20047-2_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20047-2_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20046-5

  • Online ISBN: 978-3-031-20047-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics