skip to main content
10.1145/971478.971500acmotherconferencesArticle/Chapter ViewAbstractPublication PagespuiConference Proceedingsconference-collections
Article

Audio-video array source separation for perceptual user interfaces

Published:15 November 2001Publication History

ABSTRACT

Steerable microphone arrays provide a flexible infrastructure for audio source separation. In order for them to be used effectively in perceptual user interfaces, there must be a mechanism in place for steering the focus of the array to the sound source. Audio-only steering techniques often perform poorly in the presence of multiple sound sources or strong reverberation. Video-only techniques can achieve high spatial precision but require that the audio and video subsystems be accurately calibrated to preserve this precision. We present an audio-video localization technique that combines the benefits of the two modalities. We implement our technique in a test environment containing multiple stereo cameras and a room-sized microphone array. Our technique achieves an 8.9 dB improvement over a single far-field microphone and a 6.7 dB improvement over source separation based on video-only localization.

References

  1. D. J. Beymer and K. Konolige. Real-time tracking of multiple people using stereo. In Frame-Rate Workshop, 1999.Google ScholarGoogle Scholar
  2. U. Bub, M. Hunke, and A. Waibel. Knowing who to listen to in speech recognition: Visually guided beamforming. In 1995 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1995.Google ScholarGoogle ScholarCross RefCross Ref
  3. M. Casey, W. Gardner, and S. Basu. Vision steered beamforming and transaural rendering for the artificial life interactive video environment,(alive). In 99th Convention of the Audio Engineering Society, 1995.Google ScholarGoogle Scholar
  4. M. Collobert, R. Feraud, G. LeTourneur, O. Bernier, J. E. Viallet, Y. Mahieux, and D. Collobert. Listen: a system for locating and tracking individual speakers. In 2nd International Conference on Face and Gesture Recognition, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. Darrell, D. Demirdjian, N. Checka, and P. Felzenszwalb. Plan-view trajectory estimation with dense stereo background models. In 2001 International Conference on Computer Vision, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  6. T. Darrell, G. G. Gordon, M. Harville, and J. Woodfill. Integrated person tracking using stereo, color, and pattern detection. IJCV, (37(2)):199--207, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Duraiswami, D. Zotkin, and L. S. Davis. Active speech source localization by a dual course-to-fine search. In IEEE International Conference on Acoustics, Speech, and Signal Processing, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Y. A. Ivanov, A. F. Bobick, and J. Liu. Fast lighting independent background subtraction. IJCV, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Krumm, S. Harris, B. Meyers, B. Brummit, M. Hale, and S. Shafer. Multi-camera multi-person tracking for easyliving. In 3rd IEEE Workshop on Visual Surveillance, 2000. R<10>H. F. Silverman, W. R. Patterson, and J. L. Flanagan. The huge microphone array. IEEE Concurrency, pages 36--46, Oct. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B. D. V. Veen and K. M. Buckley. Beamforming: A versatile approach to spatial filtering. IEEE ASSP Magazine, Apr. 1988.Google ScholarGoogle ScholarCross RefCross Ref
  11. M. Viberg and H. Krim. Two decades of statistical array processing. In 31st Asilomar Conference on Signals, Systems, and Computers, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  12. C. Wang and M. Brandstein. Multi-source face tracking with audio and visual data. In IEEE International Workshop on Multimedia Signal Processing, 1999.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Audio-video array source separation for perceptual user interfaces

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      PUI '01: Proceedings of the 2001 workshop on Perceptive user interfaces
      November 2001
      241 pages
      ISBN:9781450374736
      DOI:10.1145/971478

      Copyright © 2001 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 15 November 2001

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader