Audio-Visual Source Separation with Alternating Diffusion Maps

Dov, David; Talmon, Ronen; Cohen, Israel

doi:10.1007/978-3-319-73031-8_14

David Dov²,
Ronen Talmon² &
Israel Cohen²

Part of the book series: Signals and Communication Technology ((SCT))

1937 Accesses

Abstract

In this chapter we consider the separation of multiple sound sources of different types including multiple speakers and transients, which are measured by a single microphone and by a video camera. We address the problem of separating a particular sound source from all other sources focusing specifically on obtaining an underlying representation of it while attenuating all other sources. By pointing the video camera merely to the desired sound source, the problem becomes equivalent to extracting the common source to the audio and the video modalities while ignoring the other sources. We use a kernel-based method, which is particularly designed for this task, providing an underlying representation of the common source. We demonstrate the usefulness of the obtained representation for the activity detection of the common source and discuss how it may be further used for source separation.

This research was supported by the Israel Science Foundation (grant no. 576/16).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Model-Independent Method of Nonlinear Blind Source Separation

Audio source separation by activity probability detection with maximum correlation and simplex geometry

Article Open access 28 January 2021

Audio Source Separation in a Musical Context

References

R.R. Lederman, R. Talmon, Learning the geometry of common latent variables using alternating-diffusion. Appl. Comput. Harmon. Anal. (2015)
Google Scholar
S.T. Roweis, L.K. Saul, Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Article Google Scholar
M. Balasubramanian, E.L. Schwartz, J.B. Tenenbaum, V. de Silva, J.C. Langford, The isomap algorithm and topological stability. Science 295(5552), 7–7 (2002)
Google Scholar
M. Belkin, P. Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)
Article MATH Google Scholar
D.L. Donoho, C. Grimes, Hessian eigenmaps: locally linear embedding techniques for high-dimensional data. Proc. Nat. Acad. Sci. 100(10), 5591–5596 (2003)
Article MathSciNet MATH Google Scholar
R. Coifman, S. Lafon, Diffusion maps. Appl. Comput. Harmon. Anal. 21(1), 5–30 (2006)
Article MathSciNet MATH Google Scholar
D. Zhou, C.J.C. Burges, Spectral clustering and transductive learning with multiple views, in Proceedings of the 24th International Conference on Machine Learning, Corvallis, OR, USA (2007), pp. 1159–1166
Google Scholar
M.B. Blaschko, C.H. Lampert, Correlational spectral clustering, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, AK (2008), pp. 1–8
Google Scholar
V.R. De Sa, P.W. Gallagher, J.M. Lewis, V.L. Malave, Multi-view kernel construction. Mach. Learn. 79(1–2), 47–71 (2010)
MathSciNet Google Scholar
A. Kumar, P. Rai, H. Daume, Co-regularized multi-view spectral clustering, Adv. Neural Inf. Process. Syst., 1413–1421 (2011)
Google Scholar
A. Kumar, H. Daumé, A co-training approach for multi-view spectral clustering, in Proceedings of the 28th International Conference on Machine Learning (ICML), Bellevue, Washington, USA (2011), pp. 393–400
Google Scholar
Y.Y. Lin, T.L. Liu, C.S. Fuh, Multiple kernel learning for dimensionality reduction. IEEE Trans. Pattern Anal. Mach. Intell. 33(6), 1147–1160 (2011)
Article Google Scholar
B. Wang, J. Jiang, W. Wang, Z.H. Zhou, Z. Tu, Unsupervised metric fusion by cross diffusion, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI (2012), pp. 2997–3004
Google Scholar
H.C. Huang, Y.Y. Chuang, C.S. Chen, Affinity aggregation for spectral clustering, in Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI (2012), pp. 773–780
Google Scholar
B. Boots, G. Gordon, Two-manifold problems with applications to nonlinear system identification, in Proceedings of the 29th International Conference on Machine Learning (ICML), Edinburgh, Scotland, GB (2012), pp. 623–630
Google Scholar
M.M. Bronstein, K. Glashoff, T.A. Loring, Making laplacians commute (2013), arXiv:1307.6549
O. Lindenbaum, A. Yeredor, M. Salhov, A. Averbuch, Multiview diffusion maps (2015), arXiv preprint arXiv:1508.05550
T. Michaeli, W. Wang, T. Livescu, Nonparametric canonical correlation analysis, in Proceedings of the International Conference on Machine Learning (ICML), New York, USA (2016)
Google Scholar
A. Aubrey, B. Rivet, Y. Hicks, L. Girin, J. Chambers, C. Jutten, Two novel visual voice activity detectors based on appearance models and retinal filltering, Proceedings of the 15th European Signal Processing Conference (EUSIPCO) (2007), pp. 2409–2413
Google Scholar
E. Ong, R. Bowden, Robust lip-tracking using rigid flocks of selected linear predictors, Proceedings of the 8th IEEE International Conference on Automatic Face and Gesture Recognition (2008)
Google Scholar
Q. Liu, W. Wang, P. Jackson, A visual voice activity detection method with adaboosting, in Proceedings of the Sensor Signal Processing for Defence (SSPD) (IET, 2011), pp. 1–5
Google Scholar
D. Sodoyer, B. Rivet, L. Girin, J. Schwartz, C. Jutten, An analysis of visual speech information applied to voice activity detection, Proceedings of the 31st IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1 (2006)
Google Scholar
D. Sodoyer, B. Rivet, L. Girin, C. Savariaux, J. Schwartz, C. Jutten, A study of lip movements during spontaneous dialog and its application to voice activity detection. J. Acoust. Soc. Am. 125, 1184 (2009)
Article Google Scholar
S. Siatras, N. Nikolaidis, M. Krinidis, I. Pitas, Visual lip activity detection and speaker detection using mouth region intensities. IEEE Trans. Circuits Syst. Video Technol. 19(1), 133–137 (2009)
Article Google Scholar
A. Aubrey, Y. Hicks, J. Chambers, Visual voice activity detection with optical flow. IET Image Proc. 4(6), 463–472 (2010)
Article Google Scholar
P. Tiawongsombat, M. Jeong, J. Yun, B. You, S. Oh, Robust visual speakingness detection using bi-level HMM. Pattern Recogn. 45(2), 783–793 (2012)
Article Google Scholar
P. Atrey, M. Hossain, A. El Saddik, M. Kankanhalli, Multimodal fusion for multimedia analysis: a survey. Multimed. Syst. 16(6), 345–379 (2010)
Article Google Scholar
S. Tamura, M. Ishikawa, T. Hashiba, S. Takeuchi, S. Hayamizu, A robust audio-visual speech recognition using audio-visual voice activity detection, in Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH) (2010), pp. 2694–2697
Google Scholar
D. Dov, R. Talmon, I. Cohen, Audio-visual voice activity detection using diffusion maps. IEEE/ACM Trans. Audio Speech Lang. Process. 23(4), 732–745 (2015)
Article Google Scholar
R. Talmon, I. Cohen, S. Gannot, R.R. Coifman, Supervised graph-based processing for sequential transient interference suppression. IEEE Trans. Audio Speech Lang. Process. 20(9), 2528–2538 (2012)
Google Scholar
A. Hirszhorn, D. Dov, R. Talmon, I. Cohen, Transient interference suppression in speech signals based on the OM-LSA algorithm, Proceedings of the International Workshop on Acoustic Signal Enhancement (IWAENC) (2012), pp. 1–4
Google Scholar
R. Talmon, I. Cohen, S. Gannot, Clustering and suppression of transient noise in speech signals using diffusion maps, in Proceedings of the 36th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2011), pp. 5084–5087
Google Scholar
D. Dov, R. Talmon, I. Cohen, Kernel-based sensor fusion with application to audio-visual voice activity detection. IEEE Trans. Signal Process. 64(24), 6406–6416 (2016)
Article MathSciNet Google Scholar
D. Dov, R. Talmon, I. Cohen, Kernel method for voice activity detection in the presence of transients. IEEE/ACM Trans. Audio Speech Lang. Process. 24(12), 2313–2326 (2016)
Google Scholar
P.C. Mahalanobis, On the generalized distance in statistics. Proc. Nat. Inst. Sci. (Calcutta) 2, 49–55 (1936)
MATH Google Scholar
C. Fowlkes, S. Belongie, F. Chung, J. Malik, Spectral grouping using the Nyström method. IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 214–225 (2004)
Article Google Scholar
J. Shi, J. Malik, Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Article Google Scholar
http://www.freesound.org
J. Barron, D. Fleet, S. Beauchemin, Performance of optical flow techniques. Int. J. Comput. Vis. 12(1), 43–77 (1994)
Article Google Scholar
A. Bruhn, J. Weickert, C. Schnörr, Lucas/Kanade meets Horn/Schunck: combining local and global optic flow methods. Int. J. Comput. Vis. 61(3), 211–231 (2005)
Article Google Scholar
S.B. Davis, P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)
Article Google Scholar
H. Hirsch, D. Pearce, The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions, ASR2000-Automatic Speech Recognition: Challenges for the New Millenium ISCA Tutorial and Research Workshop (ITRW) (2000)
Google Scholar
B. Logan, Mel frequency cepstral coefficients for music modeling, Proceedings of the 1st International Conference on Music Information Retrieval (ISMIR) (2000)
Google Scholar
R. Talmon, I. Cohen, S. Gannot, Single-channel transient interference suppression with diffusion maps. IEEE Trans. Audio Speech Lang. Process. 21(1), 132–144 (2013)
Article Google Scholar
I. Cohen, B. Berdugo, Speech enhancement for non-stationary noise environments. Signal Process. 81(11), 2403–2418 (2001)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Andrew and Erna Viterbi Faculty of Electrical Engineering, The Technion-Israel Institute of Technology, 32000, Haifa, Israel
David Dov, Ronen Talmon & Israel Cohen

Authors

David Dov
View author publications
You can also search for this author in PubMed Google Scholar
Ronen Talmon
View author publications
You can also search for this author in PubMed Google Scholar
Israel Cohen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Dov .

Editor information

Editors and Affiliations

University of Tsukuba, Ibaraki, Japan
Shoji Makino

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dov, D., Talmon, R., Cohen, I. (2018). Audio-Visual Source Separation with Alternating Diffusion Maps. In: Makino, S. (eds) Audio Source Separation. Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-319-73031-8_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-73031-8_14
Published: 02 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73030-1
Online ISBN: 978-3-319-73031-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Audio-Visual Source Separation with Alternating Diffusion Maps

Abstract

Access this chapter

Similar content being viewed by others

Model-Independent Method of Nonlinear Blind Source Separation

Audio source separation by activity probability detection with maximum correlation and simplex geometry

Audio Source Separation in a Musical Context

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Audio-Visual Source Separation with Alternating Diffusion Maps

Abstract

Access this chapter

Similar content being viewed by others

Model-Independent Method of Nonlinear Blind Source Separation

Audio source separation by activity probability detection with maximum correlation and simplex geometry

Audio Source Separation in a Musical Context

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation