Article

Audio-video array source separation for perceptual user interfaces

Authors:
Kevin Wilson

MIT Artificial Intelligence Lab, Cambridge, MA

MIT Artificial Intelligence Lab, Cambridge, MA
View Profile

,
Neal Checka

MIT Artificial Intelligence Lab, Cambridge, MA

MIT Artificial Intelligence Lab, Cambridge, MA
View Profile

,
David Demirdjian

MIT Artificial Intelligence Lab, Cambridge, MA

MIT Artificial Intelligence Lab, Cambridge, MA
View Profile

,
Trevor Darrell

MIT Artificial Intelligence Lab, Cambridge, MA

MIT Artificial Intelligence Lab, Cambridge, MA
View Profile

PUI '01: Proceedings of the 2001 workshop on Perceptive user interfacesNovember 2001Pages 1–7https://doi.org/10.1145/971478.971500

Published:15 November 2001Publication History

PUI '01: Proceedings of the 2001 workshop on Perceptive user interfaces

Pages 1–7

ABSTRACT

Steerable microphone arrays provide a flexible infrastructure for audio source separation. In order for them to be used effectively in perceptual user interfaces, there must be a mechanism in place for steering the focus of the array to the sound source. Audio-only steering techniques often perform poorly in the presence of multiple sound sources or strong reverberation. Video-only techniques can achieve high spatial precision but require that the audio and video subsystems be accurately calibrated to preserve this precision. We present an audio-video localization technique that combines the benefits of the two modalities. We implement our technique in a test environment containing multiple stereo cameras and a room-sized microphone array. Our technique achieves an 8.9 dB improvement over a single far-field microphone and a 6.7 dB improvement over source separation based on video-only localization.

References

D. J. Beymer and K. Konolige. Real-time tracking of multiple people using stereo. In Frame-Rate Workshop, 1999.Google Scholar
U. Bub, M. Hunke, and A. Waibel. Knowing who to listen to in speech recognition: Visually guided beamforming. In 1995 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1995.Google ScholarCross Ref
M. Casey, W. Gardner, and S. Basu. Vision steered beamforming and transaural rendering for the artificial life interactive video environment,(alive). In 99th Convention of the Audio Engineering Society, 1995.Google Scholar
M. Collobert, R. Feraud, G. LeTourneur, O. Bernier, J. E. Viallet, Y. Mahieux, and D. Collobert. Listen: a system for locating and tracking individual speakers. In 2nd International Conference on Face and Gesture Recognition, 1996. Google ScholarDigital Library
T. Darrell, D. Demirdjian, N. Checka, and P. Felzenszwalb. Plan-view trajectory estimation with dense stereo background models. In 2001 International Conference on Computer Vision, 2001.Google ScholarCross Ref
T. Darrell, G. G. Gordon, M. Harville, and J. Woodfill. Integrated person tracking using stereo, color, and pattern detection. IJCV, (37(2)):199--207, June 2000. Google ScholarDigital Library
R. Duraiswami, D. Zotkin, and L. S. Davis. Active speech source localization by a dual course-to-fine search. In IEEE International Conference on Acoustics, Speech, and Signal Processing, 2001. Google ScholarDigital Library
Y. A. Ivanov, A. F. Bobick, and J. Liu. Fast lighting independent background subtraction. IJCV, 2000. Google ScholarDigital Library
J. Krumm, S. Harris, B. Meyers, B. Brummit, M. Hale, and S. Shafer. Multi-camera multi-person tracking for easyliving. In 3rd IEEE Workshop on Visual Surveillance, 2000. R<10>H. F. Silverman, W. R. Patterson, and J. L. Flanagan. The huge microphone array. IEEE Concurrency, pages 36--46, Oct. 1998. Google ScholarDigital Library
B. D. V. Veen and K. M. Buckley. Beamforming: A versatile approach to spatial filtering. IEEE ASSP Magazine, Apr. 1988.Google ScholarCross Ref
M. Viberg and H. Krim. Two decades of statistical array processing. In 31st Asilomar Conference on Signals, Systems, and Computers, 1997.Google ScholarCross Ref
C. Wang and M. Brandstein. Multi-source face tracking with audio and visual data. In IEEE International Workshop on Multimedia Signal Processing, 1999.Google ScholarCross Ref

Index Terms

Audio-video array source separation for perceptual user interfaces
1. Applied computing

Recommendations

Binaural rendering of microphone array captures based on source separation

A method for binaural rendering of sound scene recordings is proposed.Source signals and their direction of arrival is estimated using a microphone array.A low-rank NMF model for separation of sound sources is used.Speech intelligibility test with ...
Read More
Capturing and reproducing spatial audio based on a circular microphone array

This paper proposes a real-time method for capturing and reproducing spatial audio based on a circular microphone array. Following a different approach than other recently proposed array-based methods for spatial audio, the proposed method estimates the ...
Read More
Multichannel Audio Source Separation With Probabilistic Reverberation Priors

Incorporating prior knowledge about the sources and/or the mixture is a way to improve under-determined audio source separation performance. A great number of informed source separation techniques concentrate on taking priors on the sources into account,...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

PUI '01: Proceedings of the 2001 workshop on Perceptive user interfaces
November 2001
241 pages
ISBN:9781450374736
DOI:10.1145/971478

Copyright © 2001 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 November 2001
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 282
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Audio-video array source separation for perceptual user interfaces

PUI '01: Proceedings of the 2001 workshop on Perceptive user interfaces

ABSTRACT

References

Cited By

Index Terms

Recommendations

Binaural rendering of microphone array captures based on source separation

Capturing and reproducing spatial audio based on a circular microphone array

Multichannel Audio Source Separation With Probabilistic Reverberation Priors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Audio-video array source separation for perceptual user interfaces

PUI '01: Proceedings of the 2001 workshop on Perceptive user interfaces

ABSTRACT

References

Cited By

Index Terms

Recommendations

Binaural rendering of microphone array captures based on source separation

Capturing and reproducing spatial audio based on a circular microphone array

Multichannel Audio Source Separation With Probabilistic Reverberation Priors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media