Skip to main content

SCENE-pathy: Capturing the Visual Selective Attention of People Towards Scene Elements

  • Conference paper
  • First Online:
Image Analysis and Processing – ICIAP 2023 (ICIAP 2023)

Abstract

We present SCENE-pathy, a dataset and a set of baselines to study the visual selective attention (VSA) of people towards the 3D scene in which they are located. In practice, VSA allows to discover which parts of the scene are most attractive for an individual. Capturing VSA is of primary importance in the fields of marketing, retail management, surveillance, and many others. So far, VSA analysis focused on very simple scenarios: a mall shelf or a tiny room, usually with a single subject involved. Our dataset, instead, considers a multi-person and much more complex 3D scenario, specifically a high-tech fair showroom presenting machines of an Industry 4.0 production line, where 25 subjects have been captured for 2 min each when moving, observing the scene, and having social interactions. Also, the subjects filled out a questionnaire indicating which part of the scene was most interesting for them. Data acquisition was performed using Hololens 2 devices, which allowed us to get ground-truth data related to people’s tracklets and gaze trajectories. Our proposed baselines capture VSA from the mere RGB video data and a 3D scene model, providing interpretable 3D heatmaps. In total, there are more than 100K RGB frames with, for each person, the annotated 3D head positions and the 3D gaze vectors. The dataset is available here: https://intelligolabs.github.io/scene-pathy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bandini, S., Gorrini, A., Vizzari, G.: Towards an integrated approach to crowd analysis and crowd synthesis: a case study and first results. Pattern Recogn. Lett. 44, 16–29 (2014)

    Article  Google Scholar 

  2. Bao, J., Liu, B., Yu, J.: Escnet: gaze target detection with the understanding of 3d scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14126–14135 (2022)

    Google Scholar 

  3. Bartoli, F., Lisanti, G., Seidenari, L., Del Bimbo, A.: User interest profiling using tracking-free coarse gaze estimation. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 1839–1844. IEEE (2016)

    Google Scholar 

  4. Bazzani, L., Cristani, M., Tosato, D., Farenzena, M., Paggetti, G., Menegaz, G., Murino, V.: Social interactions by visual focus of attention in a three-dimensional environment. Exp. Syst. 30(2), 115–127 (2013)

    Article  Google Scholar 

  5. Becattini, F., et al.: I-mall an effective framework for personalized visits. improving the customer experience in stores. In: Proceedings of the 1st Workshop on Multimedia Computing towards Fashion Recommendation, pp. 11–19 (2022)

    Google Scholar 

  6. Birmingham, E., Bischof, W.F., Kingstone, A.: Gaze selection in complex social scenes. Vis. Cogn. 16(2–3), 341–355 (2008)

    Article  Google Scholar 

  7. Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 185–207 (2012)

    Article  Google Scholar 

  8. Carrasco, M.: Visual attention: The past 25 years. Vis. Res. 51(13), 1484–1525 (2011)

    Article  Google Scholar 

  9. Chong, E., Wang, Y., Ruiz, N., Rehg, J.M.: Detecting attended visual targets in video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5396–5406 (2020)

    Google Scholar 

  10. Fang, Y., et al.: Dual attention guided gaze target detection in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11390–11399 (2021)

    Google Scholar 

  11. Fotios, S., Uttley, J., Cheal, C., Hara, N.: Using eye-tracking to identify pedestrians’ critical visual tasks, part 1. dual task approach. Lighting Res. Technol. 47(2), 133–148 (2015)

    Google Scholar 

  12. Foulsham, T., Walker, E., Kingstone, A.: The where, what and when of gaze allocation in the lab and the natural environment. Vis. Res. 51(17), 1920–1931 (2011)

    Article  Google Scholar 

  13. Fuller, J.H.: Eye position and target amplitude effects on human visual saccadic latencies. Exp. Brain Res. 109(3), 457–466 (1996)

    Article  Google Scholar 

  14. Gordon, R.D.: Selective attention during scene perception: evidence from negative priming. Memory Cogn. 34(7), 1484–1494 (2006)

    Article  Google Scholar 

  15. Hasan, I., Setti, F., Tsesmelis, T., Del Bue, A., Galasso, F., Cristani, M.: Mx-LSTM: mixing tracklets and vislets to jointly forecast trajectories and head poses. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6067–6076 (2018)

    Google Scholar 

  16. Hu, Z., Yang, D., Cheng, S., Zhou, L., Wu, S., Liu, J.: We know where they are looking at from the RGB-d camera: Gaze following in 3d. IEEE Trans. Instrum. Meas. 71, 1–14 (2022)

    Google Scholar 

  17. Itti, L., Koch, C.: Computational modelling of visual attention. Nat. Rev. Neurosci. 2(3), 194–203 (2001)

    Article  Google Scholar 

  18. Katz, S., Tal, A., Basri, R.: Direct visibility of point sets. In: ACM SIGGRAPH 2007 Papers, pp. 24-es. SIGGRAPH 2007, Association for Computing Machinery, New York, NY, USA (2007)

    Google Scholar 

  19. Kendon, A.: Conducting interaction: Patterns of behavior in focused encounters, vol. 7. CUP Archive (1990)

    Google Scholar 

  20. Kress, B.C.: Digital optical elements and technologies (edo19): applications to AR/VR/MR. In: Digital Optical Technologies, vol. 11062, pp. 343–355 (2019)

    Google Scholar 

  21. Li, Y., Liu, M., Rehg, J.: In the eye of the beholder: gaze and actions in first person video. IEEE Trans. Pattern Anal. Mach. Intell. 45, 6731–6747 (2021)

    Article  Google Scholar 

  22. Li, Y., Shen, W., Gao, Z., Zhu, Y., Zhai, G., Guo, G.: Looking here or there? gaze following in 360-degree images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3742–3751 (2021)

    Google Scholar 

  23. Melcher, D.: Visual stability. Philos. Trans. R. S. B: Biol. Sci. 366(1564), 468–475 (2011)

    Article  Google Scholar 

  24. Parks, D., Borji, A., Itti, L.: Augmented saliency model using automatic 3d head pose detection and learned gaze following in natural scenes. Vis. Res. 116, 113–126 (2015)

    Article  Google Scholar 

  25. Pavllo, D., Feichtenhofer, C., Grangier, D., Auli, M.: 3D human pose estimation in video with temporal convolutions and semi-supervised training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7753–7762 (2019)

    Google Scholar 

  26. Posner, M.I., Snyder, C.R., Davidson, B.J.: Attention and the detection of signals. J. Exp. Psychol. Gen. 109(2), 160 (1980)

    Article  Google Scholar 

  27. Recasens, A., Khosla, A., Vondrick, C., Torralba, A.: Where are they looking? In: Advances in Neural Information Processing Systems, vol. 28 (2015)

    Google Scholar 

  28. Reid, I., Benfold, B., Patron, A., Sommerlade, E.: Understanding interactions and guiding visual surveillance by tracking attention. In: Koch, R., Huang, F. (eds.) ACCV 2010. LNCS, vol. 6468, pp. 380–389. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22822-3_38

    Chapter  Google Scholar 

  29. Shi, X., Yang, Y., Liu, Q.: I understand you: Blind 3d human attention inference from the perspective of third-person. IEEE Trans. Image Process. 30, 6212–6225 (2021)

    Article  Google Scholar 

  30. Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5693–5703 (2019)

    Google Scholar 

  31. Vinciarelli, A., Pantic, M., Bourlard, H.: Social signal processing: survey of an emerging domain. Image Vis. Comput. 27(12), 1743–1759 (2009)

    Article  Google Scholar 

  32. Wang, B., Hu, T., Li, B., Chen, X., Zhang, Z.: Gatector: A unified framework for gaze object prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19588–19597 (2022)

    Google Scholar 

  33. Zhang, Y., et al.: Bytetrack: multi-object tracking by associating every detection box. In: Proceedings of the European Conference on Computer Vision (ECCV) (2022)

    Google Scholar 

Download references

Acknowledgments

This work was partially supported by the Italian MIUR within PRIN 2017, Project Grant 20172BH297: I-MALL - improving the customer experience in stores by intelligent computer vision and PNRR research activities of the consortium iNEST (Interconnected North-Est Innovation Ecosystem) funded by the European Union Next-GenerationEU (Piano Nazionale di Ripresa e Resilienza (PNRR) - Missione 4 Componente 2, Investimento 1.5 - D.D. 1058 23/06/2022, ECS_00000043). This manuscript reflects only the Authors’ views and opinions, neither the European Union nor the European Commission can be considered responsible for them.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrea Toaiari .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Toaiari, A. et al. (2023). SCENE-pathy: Capturing the Visual Selective Attention of People Towards Scene Elements. In: Foresti, G.L., Fusiello, A., Hancock, E. (eds) Image Analysis and Processing – ICIAP 2023. ICIAP 2023. Lecture Notes in Computer Science, vol 14233. Springer, Cham. https://doi.org/10.1007/978-3-031-43148-7_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-43148-7_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-43147-0

  • Online ISBN: 978-3-031-43148-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics