Social Signal Processing for Surveillance

Dong Seon Cheng; Marco Cristani

doi:10.1017/9781316676202.024

24 - Social Signal Processing for Surveillance

from Part IV - Applications of Social Signal Processing

Published online by Cambridge University Press: 13 July 2017

Dong Seon Cheng and

Marco Cristani

Edited by

Judee K. Burgoon ,

Nadia Magnenat-Thalmann ,

Maja Pantic and

Alessandro Vinciarelli

Show author details

Dong Seon Cheng: Affiliation:
Hankuk University of Foreign Studies
Marco Cristani: Affiliation:
University of Verona
Judee K. Burgoon: Affiliation:
University of Arizona
Nadia Magnenat-Thalmann: Affiliation:
Université de Genève
Maja Pantic: Affiliation:
Imperial College London
Alessandro Vinciarelli: Affiliation:
University of Glasgow

Book contents

Get access

Summary

Automated surveillance of human activities has traditionally been a computer vision field interested in the recognition of motion patterns and in the production of high-level descriptions for actions and interactions among entities of interest (Cedras & Shah, 1995; Aggarwal & Cai, 1999; Gavrila, 1999; Moeslund, Hilton, & Krüger, 2006; Buxton, 2003; Hu et al., 2004; Turaga et al., 2008; Dee & Velastin, 2008; Aggarwal & Ryoo, 2011; Borges, Conci, & Cavallaro, 2013). The study on human activities has been revitalized in the last five years by addressing the so-called social signals (Pentland, 2007). In fact, these nonverbal cues inspired by the social, affective, and psychological literature (Vinciarelli, Pantic, & Bourlard, 2009) have allowed a more principled understanding of how humans act and react to other people and to their environment.

Social Signal Processing (SSP) is the scientific field making a systematic, algorithmic and computational analysis of social signals, drawing significant concepts from anthropology and social psychology (Vinciarelli et al., 2009). In particular, SSP does not stop at just modeling human activities, but aims at coding and decoding human behavior. In other words, it focuses on unveiling the underlying hidden states that drive one to act in a distinct way with particular actions. This challenge is supported by decades of investigation in human sciences (psychology, anthropology, sociology, etc.) that showed how humans use nonverbal behavioral cues, like facial expressions, vocalizations (laughter, fillers, back-channel, etc.), gestures, or postures to convey, often outside conscious awareness, their attitude toward other people and social environments, as well as emotions (Richmond & McCroskey, 1995). The understanding of these cues is thus paramount in order to understand the social meaning of human activities.

The formal marriage of automated video surveillance with Social Signal Processing had its programmatic start during SISM 2010 (the International Workshop on Socially Intelligent Surveillance and Monitoring; http://profs.sci.univr.it/∼cristanm/ SISM2010/), associated with the IEEE Computer Vision and Pattern Recognition conference. At that venue, the discussion was focused on what kind of social signals can be captured in a generic surveillance scenario, detailing the specific scenarios where the modeling of social aspects could be the most beneficial.

After 2010, SSP hybridizations with surveillance applications have grown rapidly in number and systematic essays about the topic started to compare in the computer vision literature (Cristani et al., 2013).

Type: Chapter
Information: Social Signal Processing , pp. 331 - 348

DOI: https://doi.org/10.1017/9781316676202.024 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2017

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abbasi, A. & Chen, H. (2008).Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace.ACMTransactions on Information Systems, 26(2), 1–29.Google Scholar

Aggarwal, J. K. & Cai, Q. (1999). Human motion analysis: A review.Computer Vision and Image understanding, 73(3), 428–440.Google Scholar

Aggarwal, J. K. & Ryoo, M. S. (2011). Human activity analysis: A review.ACM Computing Surveys, 43, 1–43.Google Scholar

Ambady, N. & Rosenthal, R. (1992). Thin slices of expressive behavior as predictors of interpersonal consequences: A meta-analysis.Psychological Bulletin, 111(2), 256–274.Google Scholar

Anderson, R. J. (2001). Security Engineering: A Guide to Building Dependable Distributed Systems. New York: John Wiley & Sons.

Andriluka, M., Roth, S., & Schiele, B. (2009). Pictorial structures revisited: People detection and articulated pose estimation. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (pp. 1014–1021).

Ba, S. O. & Odobez, J. M. (2006). A study on visual focus of attention recognition from head pose in a meeting room.Lecture Notes in Computer Science, 4299, 75–87.Google Scholar

Bazzani, L., Cristani, M., & Murino, V. (2012). Decentralized particle filter for joint individualgroup tracking. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 1888–1893).

Bazzani, L., Cristani, M., Tosato, D., et al. (2011). Social interactions by visual focus of attention in a three-dimensional environment.Expert Systems, 30(2), 115–127.Google Scholar

Benfold, B. & Reid, I. (2009). Guiding visual surveillance by tracking human attention. In Proceedings of the 20th British Machine Vision Conference, September.

Bolle, R., Connell, J., Pankanti, S., Ratha, N., & Senior, A. (2003). Guide to Biometrics. New York: Springer.

Borges, P. V. K., Conci, N., & Cavallaro, A. (2013). Video-based human behavior understanding: A survey.IEEE Transactions on Circuits and Systems for Video Technology, 23(11), 1993– 2008.Google Scholar

Buxton, H. (2003). Learning and understanding dynamic scene activity: A review.Image and Vision Computing, 21(1), 125–136.Google Scholar

Cassell, J. (1998). A framework for gesture generation and interpretation. In R, Cipolla & A, Pentland (Eds), Computer Vision in Human–Machine Interaction (pp. 191–215). New York: Cambridge University Press.

Cedras, C. & Shah, M. (1995).Motion-based recognition: A survey.Image and Vision Computing, 13(2), 129–155.Google Scholar

Chen, C. & Odobez, J. (2012). We are not contortionists: Coupled adaptive learning for head and body orientation estimation in surveillance video. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 1544–1551).

Cristani, M., Bazzani, L., Paggetti, G., et al. (2011). Social interaction discovery by statistical analysis of F-formations. In J, Hoey, S, McKenna, & E, Trucco (Eds), Proceedings of British Machine Vision Conference (pp. 23.1–23.12). Guildford, UK: BMVA Press.

Cristani, M., Paggetti, G., Vinciarelli, A., et al. (2011). Towards computational proxemics: Inferring social relations from interpersonal distances. In Proceedings of Third IEEE International Conference on Social Computing (pp. 290–297).

Cristani, M., Pesarin, A., Vinciarelli, A., Crocco, M., & Murino, V. (2011). Look at who's talking: Voice activity detection by automated gesture analysis. In Proceedings of the Workshop on Interactive Human Behavior Analysis in Open or Public Spaces (InterHub 2011).

Cristani, M., Raghavendra, R., Del Bue, A., & Murino, V. (2013). Human behavior analysis in video surveillance: A social signal processing perspective.Neurocomputing, 100(2), 86–97.Google Scholar

Cristani, M., Roffo, G., Segalin, C., et al. (2012). Conversationally inspired stylometric features for authorship attribution in instant messaging. In Proceedings of the 20th ACM International Conference on Multimedia (pp. 1121–1124).

Curhan, J. R. & Pentland, A. (2007). Thin slices of negotiation: Predicting outcomes from conversational dynamics within the first five minutes.Journal of Applied Psychology, 92(3), 802–811.Google Scholar

Dee, H. M. & Velastin, S. A. (2008). How close are we to solving the problem of automated visual surveillance.Machine Vision and Application, 19(2), 329–343.Google Scholar

Deng, Z., Xu, D., Zhang, X., & Jiang, X. (2012). IntroLib: Efficient and transparent library call introspection for malware forensics. In 12th Annual Digital Forensics Research Conference (pp. 13–23).

Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern Classification. New York: John Wiley & Sons.

Ellison, N. B, Steinfield, C., & Lampe, C. (2007). The benefits of Facebook “friends”: Social capital and college students’ use of online social network sites.Journal of Computer-Mediated Communication, 12(4), 1143–1168.Google Scholar

Fuchs, C. (2012). Internet and Surveillance: The Challenges of Web 2.0 and Social Media. New York: Routledge.

Gavrila, D. M. (1999). The visual analysis of human movement: A survey.Computer Vision and Image Understanding, 73(1), 82–98.Google Scholar

Goffman, E. (1966). Behavior in Public Places: Notes on the Social Organization of Gatherings. New York: Free Press.

Groh, G., Lehmann, A., Reimers, J., Friess, M. R., & Schwarz, L. (2010). Detecting social situations from interaction geometry. In Proceedings of the 2010 IEEE Second International Conference on Social Computing (pp. 1–8).

Hall, R. (1966). The Hidden Dimension. Garden City, NY: Doubleday.

Harman, J. P., Hansen, C. E., Cochran, M. E., & Lindsey, C. R. (2005). Liar, liar: Internet faking but not frequency of use affects social skills, self-esteem, social anxiety, and aggression.Cyberpsychology & Behavior, 8(1), 1–6.Google Scholar

Helbing, D., & Molnár, P. (1995). Social force model for pedestrian dynamics.Physical Review E, 51(5), 4282–4287.Google Scholar

Hu, W., Tan, T., Wang, L., & Maybank, S. (2004). A survey on visual surveillance of object motion and behaviors.IEEE Transactions on Systems, Man and Cybernetics, 34, 334–352.Google Scholar

Hung, H., Huang, Y., Yeo, C., & Gatica-Perez, D. (2008). Associating audio-visual activity cues in a dominance estimation framework. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, June 23–28, Anchorage, AK.

Hung, H., & Kröse, B. (2011). Detecting F-formations as dominant sets. In Proceedings of the International Conference on Multimodal Interaction (pp. 231–238).

Kendon, A. (1990). Conducting Interaction: Patterns of Behavior in Focused Encounters. New York: Cambridge University Press.

Kuncheva, L. I. (2007). A stability index for feature selection. In Proceedings of IASTED International Multi-Conference Artificial Intelligence and Applications (pp. 390–395).

Laptev, I. (2005). On space-time interest points.International Journal of Computer Vision, 64(2–3), 107–123.Google Scholar

Li, Y., Fathi, A., & Rehg, J. M. (2013). Learning to predict gaze in egocentric video. In Proceedings of 14th IEEE International Conference on Computer Vision (pp. 3216–3223).

Lin, W.-C. & Liu, Y. (2007). A lattice-based MRF model for dynamic near-regular texture tracking.IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(5), 777–792.Google Scholar

Liu, H. & Motoda, H. (2008). Computational Methods of Feature Selection. Boca Raton, FL: Chapman & Hall/CRC.

Liu, X., Krahnstoever, N., Yu, T., & Tu, P. (2007).What are customers looking at? In Proceedings of IEEE Conference on Advanced Video and Signal Based Surveillance (pp. 405–410).

Livingstone, S. & Brake, D. R. (2010). On the rapid rise of social networking sites: New findings and policy implications.Children & Society, 24(1), 75–83.Google Scholar

Lott, D. F. & Sommer, R. (1967). Seating arrangements and status.Journal of Personality and Social Psychology, 7(1), 90–95.Google Scholar

Mauthner, T., Donoser, M., & Bischof, H. (2008). Robust tracking of spatial related components. Proceedings of the International Conference on Pattern Recognition (pp. 1–4).

Moeslund, T. B., Hilton, A., & Krüger, V. (2006). A survey of advances in vision-based human motion capture and analysis.Computer Vision and Image understanding, 104(2), 90–126.Google Scholar

Newman, R. C. (2006). Cybercrime, identity theft, and fraud: Practicing safe Internet – network security threats and vulnerabilities. In Proceedings of the 3rd Annual Conference on Information Security Curriculum Development (pp. 68–78).

Oberschall, A. (1978). Theories of social conflict.Annual Review of Sociology, 4, 291–315.Google Scholar

Oikonomopoulos, A., Patras, I., & Pantic, M. (2011). Spatiotemporal localization and categorization of human actions in unsegmented image sequences.IEEE Transactions on Image Processing, 20(4), 1126–1140.Google Scholar

Orebaugh, A. & Allnutt, J. (2009). Classification of Instant Messaging Communications for Forensics Analysis.International Journal of Forensic Computer Science, 1, 22–28.Google Scholar

Panero, J. & Zelnik, M. (1979). Human Dimension and Interior Space: A Source Book of Design. New York: Whitney Library of Design.

Pang, S. K., Li, J., & Godsill, S. (2007).Models and algorithms for detection and tracking of coordinated groups. In Proceedings of International Symposium on Image and Signal Processing and Analysis (pp. 504–509).

Park, S. & Trivedi, M. M. (2007). Multi-person interaction and activity analysis: A synergistic track- and body-level analysis framework.Machine Vision and Application, 18, 151–166.Google Scholar

Pavan, M. & Pelillo, M. (2007). Dominant sets and pairwise clustering.IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(1): 167–172.Google Scholar

Pellegrini, S., Ess, A., Schindler, K., & Van Gool, L. (2009). You’ll never walk alone: Modeling social behavior for multi-target tracking. In Proceedings of 12th International Conference on Computer Vision, Kyoto, Japan (pp. 261–268).

Pellegrini, S., Ess, A., & Van Gool, L. (2010). Improving data association by joint modeling of pedestrian trajectories and groupings. In Proceedings of European Conference on Computer Vision (pp. 452–465).

Pentland, A. (2007). Social signal processing.IEEE Signal Processing Magazine, 24(4), 108–111.Google Scholar

Pesarin, A., Cristani, M., Murino, V., & Vinciarelli, A. (2012). Conversation analysis at work: Detection of conflict in competitive discussions through semi-automatic turn-organization analysis.Cognitive Processing, 13(2), 533–540.Google Scholar

Pianesi, F., Mana, N., Ceppelletti, A., Lepri, B., & Zancanaro, M. (2008). Multimodal recognition of personality traits in social interactions. Proceedings of International Conference on Multimodal Interfaces (pp. 53–60).

Popa, M., Koc, A. K., Rothkrantz, L. J. M., Shan, C., & Wiggers, P. (2012). Kinect sensing of shopping related actions. In R, Wichert, K, van Laerhoven, & J, Gelissen (Eds), Constructing Ambient Intelligence (vol. 277, pp. 91–100). Berlin: Springer.

Qin, Z. & Shelton, C. R. (2012). Improving multi-target tracking via social grouping. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 1972–1978).

Rajagopalan, S. S., Dhall, A., & Goecke, R. (2013). Self-stimulatory behaviours in the wild for autism diagnosis. In Proceedings of IEEE Workshop on Decoding Subtle Cues from Social Interactions (associated with ICCV 2013) (pp. 755–761).

Richmond, V. & McCroskey, J. (1995). Nonverbal Behaviors in Interpersonal Relations. Boston: Allyn and Bacon.

Robertson, N. M., & Reid, I. D. (2011). Automatic reasoning about causal events in surveillance video.EURASIP Journal on Image and Video Processing, 1, 1–19.Google Scholar

Russo, N. (1967). Connotation of seating arrangements.The Cornell Journal of Social Relations, 2(1), 37–44.Google Scholar

Salamin, H., Favre, S., & Vinciarelli, A. (2009). Automatic role recognition in multiparty recordings: Using social affiliation networks for feature extraction.IEEE Transactions on Multimedia, 11(7), 1373–1380.Google Scholar

Schegloff, E. (2000). Overlapping talk and the organisation of turn-taking for conversation.Language in Society, 29(1), 1–63.Google Scholar

Scovanner, P. & Tappen, M. F. (2009). Learning pedestrian dynamics from the real world. In Proceedings International Conference on Computer Vision (pp. 381–388).

Smith, K., Ba, S., Odobez, J., & Gatica-Perez, D. (2008). Tracking the visual focus of attention for a varying number of wandering people.IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(7), 1–18.Google Scholar

Stiefelhagen, R., Finke, M., Yang, J., & Waibel, A. (1999). From gaze to focus of attention.Lecture Notes in Computer Science, 1614, 761–768.Google Scholar

Stiefelhagen, R., Yang, J., & Waibel, A. (2002). Modeling focus of attention for meeting indexing based on multiple cues.IEEE Transactions on Neural Networks, 13, 928–938.Google Scholar

Tajfel, H. (1982). Social psychology of intergroup relations.Annual Review of Psychology, 33, 1–39.Google Scholar

Tosato, D., Spera, M., Cristani, M., & Murino, V. (2013). Characterizing humans on Riemannian manifolds.IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 2–15.Google Scholar

Turaga, P., Chellappa, R., Subrahmanian, V. S., & Udrea, O. (2008). Machine recognition of human activities: A survey.IEEE Transactions on Circuits and Systems for Video Technology, 18(11), 1473–1488.Google Scholar

Vinciarelli, A., Pantic, M., & Bourlard, H. (2009). Social signal processing: Survey of an emerging domain.Image and Vision Computing Journal, 27(12), 1743–1759.Google Scholar

Yamaguchi, K., Berg, A. C., Ortiz, L. E., & Berg, T. L. (2011). Who are you with and where are you going? In Proceedings of IEEE Conference on Computer Vision and Patter Recognition (pp. 1345–1352).

Yang, Y. & Ramanan, D. (2011). Articulated pose estimation with flexible mixtures-of-parts. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 1385–1392).

Zen, G., Lepri, B., Ricci, E., & Lanz, O. (2010). Space speaks: Towards socially and personality aware visual surveillance. Proceedings of the 1st ACM International Workshop on Multimodal Pervasive Video Analysis (pp. 37–42).

Zhou, L. & Zhang, D. (2004). Can online behavior unveil deceivers? An exploratory investigation of deception in instant messaging. In Proceedings of the Hawaii International Conference on System Sciences(no. 37, p. 22).Google Scholar

Book contents

24 - Social Signal Processing for Surveillance

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive