Abstract
We present SCENE-pathy, a dataset and a set of baselines to study the visual selective attention (VSA) of people towards the 3D scene in which they are located. In practice, VSA allows to discover which parts of the scene are most attractive for an individual. Capturing VSA is of primary importance in the fields of marketing, retail management, surveillance, and many others. So far, VSA analysis focused on very simple scenarios: a mall shelf or a tiny room, usually with a single subject involved. Our dataset, instead, considers a multi-person and much more complex 3D scenario, specifically a high-tech fair showroom presenting machines of an Industry 4.0 production line, where 25 subjects have been captured for 2 min each when moving, observing the scene, and having social interactions. Also, the subjects filled out a questionnaire indicating which part of the scene was most interesting for them. Data acquisition was performed using Hololens 2 devices, which allowed us to get ground-truth data related to people’s tracklets and gaze trajectories. Our proposed baselines capture VSA from the mere RGB video data and a 3D scene model, providing interpretable 3D heatmaps. In total, there are more than 100K RGB frames with, for each person, the annotated 3D head positions and the 3D gaze vectors. The dataset is available here: https://intelligolabs.github.io/scene-pathy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bandini, S., Gorrini, A., Vizzari, G.: Towards an integrated approach to crowd analysis and crowd synthesis: a case study and first results. Pattern Recogn. Lett. 44, 16–29 (2014)
Bao, J., Liu, B., Yu, J.: Escnet: gaze target detection with the understanding of 3d scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14126–14135 (2022)
Bartoli, F., Lisanti, G., Seidenari, L., Del Bimbo, A.: User interest profiling using tracking-free coarse gaze estimation. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 1839–1844. IEEE (2016)
Bazzani, L., Cristani, M., Tosato, D., Farenzena, M., Paggetti, G., Menegaz, G., Murino, V.: Social interactions by visual focus of attention in a three-dimensional environment. Exp. Syst. 30(2), 115–127 (2013)
Becattini, F., et al.: I-mall an effective framework for personalized visits. improving the customer experience in stores. In: Proceedings of the 1st Workshop on Multimedia Computing towards Fashion Recommendation, pp. 11–19 (2022)
Birmingham, E., Bischof, W.F., Kingstone, A.: Gaze selection in complex social scenes. Vis. Cogn. 16(2–3), 341–355 (2008)
Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 185–207 (2012)
Carrasco, M.: Visual attention: The past 25 years. Vis. Res. 51(13), 1484–1525 (2011)
Chong, E., Wang, Y., Ruiz, N., Rehg, J.M.: Detecting attended visual targets in video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5396–5406 (2020)
Fang, Y., et al.: Dual attention guided gaze target detection in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11390–11399 (2021)
Fotios, S., Uttley, J., Cheal, C., Hara, N.: Using eye-tracking to identify pedestrians’ critical visual tasks, part 1. dual task approach. Lighting Res. Technol. 47(2), 133–148 (2015)
Foulsham, T., Walker, E., Kingstone, A.: The where, what and when of gaze allocation in the lab and the natural environment. Vis. Res. 51(17), 1920–1931 (2011)
Fuller, J.H.: Eye position and target amplitude effects on human visual saccadic latencies. Exp. Brain Res. 109(3), 457–466 (1996)
Gordon, R.D.: Selective attention during scene perception: evidence from negative priming. Memory Cogn. 34(7), 1484–1494 (2006)
Hasan, I., Setti, F., Tsesmelis, T., Del Bue, A., Galasso, F., Cristani, M.: Mx-LSTM: mixing tracklets and vislets to jointly forecast trajectories and head poses. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6067–6076 (2018)
Hu, Z., Yang, D., Cheng, S., Zhou, L., Wu, S., Liu, J.: We know where they are looking at from the RGB-d camera: Gaze following in 3d. IEEE Trans. Instrum. Meas. 71, 1–14 (2022)
Itti, L., Koch, C.: Computational modelling of visual attention. Nat. Rev. Neurosci. 2(3), 194–203 (2001)
Katz, S., Tal, A., Basri, R.: Direct visibility of point sets. In: ACM SIGGRAPH 2007 Papers, pp. 24-es. SIGGRAPH 2007, Association for Computing Machinery, New York, NY, USA (2007)
Kendon, A.: Conducting interaction: Patterns of behavior in focused encounters, vol. 7. CUP Archive (1990)
Kress, B.C.: Digital optical elements and technologies (edo19): applications to AR/VR/MR. In: Digital Optical Technologies, vol. 11062, pp. 343–355 (2019)
Li, Y., Liu, M., Rehg, J.: In the eye of the beholder: gaze and actions in first person video. IEEE Trans. Pattern Anal. Mach. Intell. 45, 6731–6747 (2021)
Li, Y., Shen, W., Gao, Z., Zhu, Y., Zhai, G., Guo, G.: Looking here or there? gaze following in 360-degree images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3742–3751 (2021)
Melcher, D.: Visual stability. Philos. Trans. R. S. B: Biol. Sci. 366(1564), 468–475 (2011)
Parks, D., Borji, A., Itti, L.: Augmented saliency model using automatic 3d head pose detection and learned gaze following in natural scenes. Vis. Res. 116, 113–126 (2015)
Pavllo, D., Feichtenhofer, C., Grangier, D., Auli, M.: 3D human pose estimation in video with temporal convolutions and semi-supervised training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7753–7762 (2019)
Posner, M.I., Snyder, C.R., Davidson, B.J.: Attention and the detection of signals. J. Exp. Psychol. Gen. 109(2), 160 (1980)
Recasens, A., Khosla, A., Vondrick, C., Torralba, A.: Where are they looking? In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Reid, I., Benfold, B., Patron, A., Sommerlade, E.: Understanding interactions and guiding visual surveillance by tracking attention. In: Koch, R., Huang, F. (eds.) ACCV 2010. LNCS, vol. 6468, pp. 380–389. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22822-3_38
Shi, X., Yang, Y., Liu, Q.: I understand you: Blind 3d human attention inference from the perspective of third-person. IEEE Trans. Image Process. 30, 6212–6225 (2021)
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5693–5703 (2019)
Vinciarelli, A., Pantic, M., Bourlard, H.: Social signal processing: survey of an emerging domain. Image Vis. Comput. 27(12), 1743–1759 (2009)
Wang, B., Hu, T., Li, B., Chen, X., Zhang, Z.: Gatector: A unified framework for gaze object prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19588–19597 (2022)
Zhang, Y., et al.: Bytetrack: multi-object tracking by associating every detection box. In: Proceedings of the European Conference on Computer Vision (ECCV) (2022)
Acknowledgments
This work was partially supported by the Italian MIUR within PRIN 2017, Project Grant 20172BH297: I-MALL - improving the customer experience in stores by intelligent computer vision and PNRR research activities of the consortium iNEST (Interconnected North-Est Innovation Ecosystem) funded by the European Union Next-GenerationEU (Piano Nazionale di Ripresa e Resilienza (PNRR) - Missione 4 Componente 2, Investimento 1.5 - D.D. 1058 23/06/2022, ECS_00000043). This manuscript reflects only the Authors’ views and opinions, neither the European Union nor the European Commission can be considered responsible for them.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Toaiari, A. et al. (2023). SCENE-pathy: Capturing the Visual Selective Attention of People Towards Scene Elements. In: Foresti, G.L., Fusiello, A., Hancock, E. (eds) Image Analysis and Processing – ICIAP 2023. ICIAP 2023. Lecture Notes in Computer Science, vol 14233. Springer, Cham. https://doi.org/10.1007/978-3-031-43148-7_30
Download citation
DOI: https://doi.org/10.1007/978-3-031-43148-7_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43147-0
Online ISBN: 978-3-031-43148-7
eBook Packages: Computer ScienceComputer Science (R0)