ABSTRACT
Steerable microphone arrays provide a flexible infrastructure for audio source separation. In order for them to be used effectively in perceptual user interfaces, there must be a mechanism in place for steering the focus of the array to the sound source. Audio-only steering techniques often perform poorly in the presence of multiple sound sources or strong reverberation. Video-only techniques can achieve high spatial precision but require that the audio and video subsystems be accurately calibrated to preserve this precision. We present an audio-video localization technique that combines the benefits of the two modalities. We implement our technique in a test environment containing multiple stereo cameras and a room-sized microphone array. Our technique achieves an 8.9 dB improvement over a single far-field microphone and a 6.7 dB improvement over source separation based on video-only localization.
- D. J. Beymer and K. Konolige. Real-time tracking of multiple people using stereo. In Frame-Rate Workshop, 1999.Google Scholar
- U. Bub, M. Hunke, and A. Waibel. Knowing who to listen to in speech recognition: Visually guided beamforming. In 1995 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1995.Google ScholarCross Ref
- M. Casey, W. Gardner, and S. Basu. Vision steered beamforming and transaural rendering for the artificial life interactive video environment,(alive). In 99th Convention of the Audio Engineering Society, 1995.Google Scholar
- M. Collobert, R. Feraud, G. LeTourneur, O. Bernier, J. E. Viallet, Y. Mahieux, and D. Collobert. Listen: a system for locating and tracking individual speakers. In 2nd International Conference on Face and Gesture Recognition, 1996. Google ScholarDigital Library
- T. Darrell, D. Demirdjian, N. Checka, and P. Felzenszwalb. Plan-view trajectory estimation with dense stereo background models. In 2001 International Conference on Computer Vision, 2001.Google ScholarCross Ref
- T. Darrell, G. G. Gordon, M. Harville, and J. Woodfill. Integrated person tracking using stereo, color, and pattern detection. IJCV, (37(2)):199--207, June 2000. Google ScholarDigital Library
- R. Duraiswami, D. Zotkin, and L. S. Davis. Active speech source localization by a dual course-to-fine search. In IEEE International Conference on Acoustics, Speech, and Signal Processing, 2001. Google ScholarDigital Library
- Y. A. Ivanov, A. F. Bobick, and J. Liu. Fast lighting independent background subtraction. IJCV, 2000. Google ScholarDigital Library
- J. Krumm, S. Harris, B. Meyers, B. Brummit, M. Hale, and S. Shafer. Multi-camera multi-person tracking for easyliving. In 3rd IEEE Workshop on Visual Surveillance, 2000. R<10>H. F. Silverman, W. R. Patterson, and J. L. Flanagan. The huge microphone array. IEEE Concurrency, pages 36--46, Oct. 1998. Google ScholarDigital Library
- B. D. V. Veen and K. M. Buckley. Beamforming: A versatile approach to spatial filtering. IEEE ASSP Magazine, Apr. 1988.Google ScholarCross Ref
- M. Viberg and H. Krim. Two decades of statistical array processing. In 31st Asilomar Conference on Signals, Systems, and Computers, 1997.Google ScholarCross Ref
- C. Wang and M. Brandstein. Multi-source face tracking with audio and visual data. In IEEE International Workshop on Multimedia Signal Processing, 1999.Google ScholarCross Ref
Index Terms
- Audio-video array source separation for perceptual user interfaces
Recommendations
Binaural rendering of microphone array captures based on source separation
A method for binaural rendering of sound scene recordings is proposed.Source signals and their direction of arrival is estimated using a microphone array.A low-rank NMF model for separation of sound sources is used.Speech intelligibility test with ...
Capturing and reproducing spatial audio based on a circular microphone array
This paper proposes a real-time method for capturing and reproducing spatial audio based on a circular microphone array. Following a different approach than other recently proposed array-based methods for spatial audio, the proposed method estimates the ...
Multichannel Audio Source Separation With Probabilistic Reverberation Priors
Incorporating prior knowledge about the sources and/or the mixture is a way to improve under-determined audio source separation performance. A great number of informed source separation techniques concentrate on taking priors on the sources into account,...
Comments