Skip to main content
Log in

Robust multimodal audio–visual processing for advanced context awareness in smart spaces

  • Original Article
  • Published:
Personal and Ubiquitous Computing Aims and scope Submit manuscript

Abstract

Identifying people and tracking their locations is a key prerequisite to achieving context awareness in smart spaces. Moreover, in realistic context-aware applications, these tasks have to be carried out in a non-obtrusive fashion. In this paper we present a set of robust person-identification and tracking algorithms, based on audio and visual processing. A main characteristic of these algorithms is that they operate on far-field and un-constrained audio–visual streams, which ensure that they are non-intrusive. We also illustrate that the combination of their outputs can lead to composite multimodal tracking components, which are suitable for supporting a broad range of context-aware services. In combining audio–visual processing results, we exploit a context-modeling approach based on a graph of situations. Accordingly, we discuss the implementation of realistic prototype applications that make use of the full range of audio, visual and multimodal algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Weiser M (1991) The computer for the 21st century. Sci Am 265(3):66–75

    Article  Google Scholar 

  2. Anind D, Salber D, Abowd G (2001) A conceptual framework and a toolkit for supporting the rapid prototyping of context-aware applications. Human-Computer Interaction, Lawrence Erlbaum Associates, 16

  3. Want R, Hopper A, Falcao V, Gibbons J (1992) The active badge location system. ACM Trans Inform Syst 10(1):91–102

    Article  Google Scholar 

  4. Smailagic A, Siewiorek DP (2002) Application design for wearable and context-aware computers. IEEE Pervasive Comput 1(4):20–29

    Article  Google Scholar 

  5. Johanson B, Fox A, Winograd T (2002) The interactive workspaces project: experiences with ubiquitous computing rooms. IEEE Pervasive Computi Magaz 1(2)

  6. Ekenel H, Pnevmatikakis A (2006) Video-based face recognition evaluation in the CHIL Project—run 1. Face and gesture recognition, Southampton, UK, pp 85–90

  7. McIvor A (2000) Background subtraction techniques. Image and Vision Computing, New Zealand

    Google Scholar 

  8. Stauffer C, Grimson WEL (2000) Learning patterns of activity using real-time tracking. IEEE Trans Pattern Anal and Machine Intel 22:747–757

    Article  Google Scholar 

  9. KaewTraKulPong P, Bowden R (2001) An improved adaptive background mixture model for real-time tracking with shadow detection. In: Proceedings of 2nd European workshop on advanced video based surveillance systems (AVBS01)

  10. Landabaso JL, Pardas M (2005) Foreground regions extraction and characterization towards real-time object tracking. In: Proceedings of joint workshop on multimodal interaction and related machine learning algorithms (MLMI ’05)

  11. Xu LQ, Landabaso JL, Pardas M (1986) Shadow removal with blob-based morphological reconstruction for error correction. IEEE international conference on acoustics, speech, and signal processing

  12. Blackman S (1986) Multiple-target tracking with radar applications, Chap. 14. Artech House, Dedham

  13. Jones M, Rehg J (2002) Statistical color models with application to skin detection. Int J Comput Vision 46(1):81–96

    Article  MATH  Google Scholar 

  14. Pnevmatikakis A, Polymenakos L (2005) A testing methodology for face recognition algorithms. In: Renals S, Bengio S (eds) MLMI 2005, Lecture Notes in Computer Science, vol 3869. Springer, Berlin, pp 218–229

  15. Kittler J, Hatef M, Duin RPW, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20(3):226–239

    Article  Google Scholar 

  16. Knapp CH, Carter GC (1976) The generalized correlation method for estimation of time delay. IEEE Trans Acoust Speech, Signal Process 24(4):320–327

    Article  Google Scholar 

  17. Talantzis F, Constantinides AG, Polymenakos L (2005) Estimation of direction of arrival using information theory. IEEE Signal Process 12(8):561–564

    Article  Google Scholar 

  18. Bell A, Sejnowski T (1995) An information maximization approach to blind separation and blind deconvolution. Neural Comput 7:1129–1159

    Article  Google Scholar 

  19. Cover TM, Thomas JA (1991) Elements of information theory. Wiley, New York

    MATH  Google Scholar 

  20. Smith J, Abel J (1987) Closed-form least-squares source location estimation from range-difference measurements. IEEE Trans Acoust Speech Signal Process ASSP 35:1661–1669

    Article  Google Scholar 

  21. Stergiou A, Pnevmatikakis A, Polymenakos L (2006) A decision fusion system across time and classifiers for audio–visual person identification. In: Stiefelhagen R, Garofolo J (eds) CLEAR 2006, Lecture Notes in Computer Science. Springer, Berlin

  22. Strobel N, Spors S, Rabenstein R (2001) Joint audio–video signal processing for object localization and tracking. In: Brandstein M, Ward D (eds) Microphone arrays, Springer, Heidelberg

  23. Crowley JL (2003) Context driven observation of human activity. In: Proceedings of the European symposium on ambient intelligence

  24. Soldatos J, Pandis I, Stamatis K, Polymenakos L, Crowley J (2006) A middleware infrastructure for autonomous context-aware computing services, computer communications magazine, special Issue on emerging middleware for next generation networks

  25. Azodolmolky S, Dimakis N, Mylonakis V, Souretis G, Soldatos J, Pnevmatikakis A, Polymenakos L (2005) Middleware for indoor ambient intelligence: the PolyOmaton system. In: Proceedings of the 2nd NGNM Workshop, Networking 2005, Waterloo, Canada

  26. Soldatos J, Polymenakos L, Pnevmatikakis A, Talantzis F, Stamatis K, Carras M (2005) Perceptual interfaces and distributed agents supporting ubiquitous computing services. In: Proceedings of the Eurescom Summit, pp. 43–50

Download references

Acknowledgments

This work is sponsored by the European Union under the integrated project CHIL, contract number 506909.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Pnevmatikakis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pnevmatikakis, A., Soldatos, J., Talantzis, F. et al. Robust multimodal audio–visual processing for advanced context awareness in smart spaces. Pers Ubiquit Comput 13, 3–14 (2009). https://doi.org/10.1007/s00779-007-0169-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00779-007-0169-9

Keywords

Navigation