SCENE-pathy: Capturing the Visual Selective Attention of People Towards Scene Elements

Toaiari, Andrea; Cunico, Federico; Taioli, Francesco; Caputo, Ariel; Menegaz, Gloria; Giachetti, Andrea; Farinella, Giovanni Maria; Cristani, Marco

doi:10.1007/978-3-031-43148-7_30

Andrea Toaiari¹⁰,
Federico Cunico¹⁰,
Francesco Taioli¹⁰,
Ariel Caputo¹⁰,
Gloria Menegaz¹⁰,
Andrea Giachetti¹⁰,
Giovanni Maria Farinella¹¹ &
…
Marco Cristani¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14233))

Included in the following conference series:

International Conference on Image Analysis and Processing

524 Accesses

Abstract

We present SCENE-pathy, a dataset and a set of baselines to study the visual selective attention (VSA) of people towards the 3D scene in which they are located. In practice, VSA allows to discover which parts of the scene are most attractive for an individual. Capturing VSA is of primary importance in the fields of marketing, retail management, surveillance, and many others. So far, VSA analysis focused on very simple scenarios: a mall shelf or a tiny room, usually with a single subject involved. Our dataset, instead, considers a multi-person and much more complex 3D scenario, specifically a high-tech fair showroom presenting machines of an Industry 4.0 production line, where 25 subjects have been captured for 2 min each when moving, observing the scene, and having social interactions. Also, the subjects filled out a questionnaire indicating which part of the scene was most interesting for them. Data acquisition was performed using Hololens 2 devices, which allowed us to get ground-truth data related to people’s tracklets and gaze trajectories. Our proposed baselines capture VSA from the mere RGB video data and a 3D scene model, providing interpretable 3D heatmaps. In total, there are more than 100K RGB frames with, for each person, the annotated 3D head positions and the 3D gaze vectors. The dataset is available here: https://intelligolabs.github.io/scene-pathy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bandini, S., Gorrini, A., Vizzari, G.: Towards an integrated approach to crowd analysis and crowd synthesis: a case study and first results. Pattern Recogn. Lett. 44, 16–29 (2014)
Article Google Scholar
Bao, J., Liu, B., Yu, J.: Escnet: gaze target detection with the understanding of 3d scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14126–14135 (2022)
Google Scholar
Bartoli, F., Lisanti, G., Seidenari, L., Del Bimbo, A.: User interest profiling using tracking-free coarse gaze estimation. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 1839–1844. IEEE (2016)
Google Scholar
Bazzani, L., Cristani, M., Tosato, D., Farenzena, M., Paggetti, G., Menegaz, G., Murino, V.: Social interactions by visual focus of attention in a three-dimensional environment. Exp. Syst. 30(2), 115–127 (2013)
Article Google Scholar
Becattini, F., et al.: I-mall an effective framework for personalized visits. improving the customer experience in stores. In: Proceedings of the 1st Workshop on Multimedia Computing towards Fashion Recommendation, pp. 11–19 (2022)
Google Scholar
Birmingham, E., Bischof, W.F., Kingstone, A.: Gaze selection in complex social scenes. Vis. Cogn. 16(2–3), 341–355 (2008)
Article Google Scholar
Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 185–207 (2012)
Article Google Scholar
Carrasco, M.: Visual attention: The past 25 years. Vis. Res. 51(13), 1484–1525 (2011)
Article Google Scholar
Chong, E., Wang, Y., Ruiz, N., Rehg, J.M.: Detecting attended visual targets in video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5396–5406 (2020)
Google Scholar
Fang, Y., et al.: Dual attention guided gaze target detection in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11390–11399 (2021)
Google Scholar
Fotios, S., Uttley, J., Cheal, C., Hara, N.: Using eye-tracking to identify pedestrians’ critical visual tasks, part 1. dual task approach. Lighting Res. Technol. 47(2), 133–148 (2015)
Google Scholar
Foulsham, T., Walker, E., Kingstone, A.: The where, what and when of gaze allocation in the lab and the natural environment. Vis. Res. 51(17), 1920–1931 (2011)
Article Google Scholar
Fuller, J.H.: Eye position and target amplitude effects on human visual saccadic latencies. Exp. Brain Res. 109(3), 457–466 (1996)
Article Google Scholar
Gordon, R.D.: Selective attention during scene perception: evidence from negative priming. Memory Cogn. 34(7), 1484–1494 (2006)
Article Google Scholar
Hasan, I., Setti, F., Tsesmelis, T., Del Bue, A., Galasso, F., Cristani, M.: Mx-LSTM: mixing tracklets and vislets to jointly forecast trajectories and head poses. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6067–6076 (2018)
Google Scholar
Hu, Z., Yang, D., Cheng, S., Zhou, L., Wu, S., Liu, J.: We know where they are looking at from the RGB-d camera: Gaze following in 3d. IEEE Trans. Instrum. Meas. 71, 1–14 (2022)
Google Scholar
Itti, L., Koch, C.: Computational modelling of visual attention. Nat. Rev. Neurosci. 2(3), 194–203 (2001)
Article Google Scholar
Katz, S., Tal, A., Basri, R.: Direct visibility of point sets. In: ACM SIGGRAPH 2007 Papers, pp. 24-es. SIGGRAPH 2007, Association for Computing Machinery, New York, NY, USA (2007)
Google Scholar
Kendon, A.: Conducting interaction: Patterns of behavior in focused encounters, vol. 7. CUP Archive (1990)
Google Scholar
Kress, B.C.: Digital optical elements and technologies (edo19): applications to AR/VR/MR. In: Digital Optical Technologies, vol. 11062, pp. 343–355 (2019)
Google Scholar
Li, Y., Liu, M., Rehg, J.: In the eye of the beholder: gaze and actions in first person video. IEEE Trans. Pattern Anal. Mach. Intell. 45, 6731–6747 (2021)
Article Google Scholar
Li, Y., Shen, W., Gao, Z., Zhu, Y., Zhai, G., Guo, G.: Looking here or there? gaze following in 360-degree images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3742–3751 (2021)
Google Scholar
Melcher, D.: Visual stability. Philos. Trans. R. S. B: Biol. Sci. 366(1564), 468–475 (2011)
Article Google Scholar
Parks, D., Borji, A., Itti, L.: Augmented saliency model using automatic 3d head pose detection and learned gaze following in natural scenes. Vis. Res. 116, 113–126 (2015)
Article Google Scholar
Pavllo, D., Feichtenhofer, C., Grangier, D., Auli, M.: 3D human pose estimation in video with temporal convolutions and semi-supervised training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7753–7762 (2019)
Google Scholar
Posner, M.I., Snyder, C.R., Davidson, B.J.: Attention and the detection of signals. J. Exp. Psychol. Gen. 109(2), 160 (1980)
Article Google Scholar
Recasens, A., Khosla, A., Vondrick, C., Torralba, A.: Where are they looking? In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Google Scholar
Reid, I., Benfold, B., Patron, A., Sommerlade, E.: Understanding interactions and guiding visual surveillance by tracking attention. In: Koch, R., Huang, F. (eds.) ACCV 2010. LNCS, vol. 6468, pp. 380–389. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22822-3_38
Chapter Google Scholar
Shi, X., Yang, Y., Liu, Q.: I understand you: Blind 3d human attention inference from the perspective of third-person. IEEE Trans. Image Process. 30, 6212–6225 (2021)
Article Google Scholar
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5693–5703 (2019)
Google Scholar
Vinciarelli, A., Pantic, M., Bourlard, H.: Social signal processing: survey of an emerging domain. Image Vis. Comput. 27(12), 1743–1759 (2009)
Article Google Scholar
Wang, B., Hu, T., Li, B., Chen, X., Zhang, Z.: Gatector: A unified framework for gaze object prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19588–19597 (2022)
Google Scholar
Zhang, Y., et al.: Bytetrack: multi-object tracking by associating every detection box. In: Proceedings of the European Conference on Computer Vision (ECCV) (2022)
Google Scholar

Download references

Acknowledgments

This work was partially supported by the Italian MIUR within PRIN 2017, Project Grant 20172BH297: I-MALL - improving the customer experience in stores by intelligent computer vision and PNRR research activities of the consortium iNEST (Interconnected North-Est Innovation Ecosystem) funded by the European Union Next-GenerationEU (Piano Nazionale di Ripresa e Resilienza (PNRR) - Missione 4 Componente 2, Investimento 1.5 - D.D. 1058 23/06/2022, ECS_00000043). This manuscript reflects only the Authors’ views and opinions, neither the European Union nor the European Commission can be considered responsible for them.

Author information

Authors and Affiliations

University of Verona, Verona, Italy
Andrea Toaiari, Federico Cunico, Francesco Taioli, Ariel Caputo, Gloria Menegaz, Andrea Giachetti & Marco Cristani
University of Catania, Catania, Italy
Giovanni Maria Farinella

Authors

Andrea Toaiari
View author publications
You can also search for this author in PubMed Google Scholar
Federico Cunico
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Taioli
View author publications
You can also search for this author in PubMed Google Scholar
Ariel Caputo
View author publications
You can also search for this author in PubMed Google Scholar
Gloria Menegaz
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Giachetti
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Maria Farinella
View author publications
You can also search for this author in PubMed Google Scholar
Marco Cristani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrea Toaiari .

Editor information

Editors and Affiliations

University of Udine, Udine, Italy
Gian Luca Foresti
University of Udine, Udine, Italy
Andrea Fusiello
University of York, York, UK
Edwin Hancock

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Toaiari, A. et al. (2023). SCENE-pathy: Capturing the Visual Selective Attention of People Towards Scene Elements. In: Foresti, G.L., Fusiello, A., Hancock, E. (eds) Image Analysis and Processing – ICIAP 2023. ICIAP 2023. Lecture Notes in Computer Science, vol 14233. Springer, Cham. https://doi.org/10.1007/978-3-031-43148-7_30

Download citation

DOI: https://doi.org/10.1007/978-3-031-43148-7_30
Published: 05 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43147-0
Online ISBN: 978-3-031-43148-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

SCENE-pathy: Capturing the Visual Selective Attention of People Towards Scene Elements