Multi-View Spectrogram Transformer for Respiratory Sound Classification

He, Wentao; Yan, Yuchen; Ren, Jianfeng; Bai, Ruibin; Jiang, Xudong

Computer Science > Sound

arXiv:2311.09655 (cs)

[Submitted on 16 Nov 2023 (v1), last revised 5 Dec 2023 (this version, v2)]

Title:Multi-View Spectrogram Transformer for Respiratory Sound Classification

Authors:Wentao He, Yuchen Yan, Jianfeng Ren, Ruibin Bai, Xudong Jiang

View PDF HTML (experimental)

Abstract:Deep neural networks have been applied to audio spectrograms for respiratory sound classification. Existing models often treat the spectrogram as a synthetic image while overlooking its physical characteristics. In this paper, a Multi-View Spectrogram Transformer (MVST) is proposed to embed different views of time-frequency characteristics into the vision transformer. Specifically, the proposed MVST splits the mel-spectrogram into different sized patches, representing the multi-view acoustic elements of a respiratory sound. These patches and positional embeddings are then fed into transformer encoders to extract the attentional information among patches through a self-attention mechanism. Finally, a gated fusion scheme is designed to automatically weigh the multi-view features to highlight the best one in a specific scenario. Experimental results on the ICBHI dataset demonstrate that the proposed MVST significantly outperforms state-of-the-art methods for classifying respiratory sounds.

Comments:	Under review
Subjects:	Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2311.09655 [cs.SD]
	(or arXiv:2311.09655v2 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2311.09655

Submission history

From: Yuchen Yan [view email]
[v1] Thu, 16 Nov 2023 08:17:02 UTC (1,269 KB)
[v2] Tue, 5 Dec 2023 09:10:37 UTC (1,269 KB)

Computer Science > Sound

Title:Multi-View Spectrogram Transformer for Respiratory Sound Classification

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Multi-View Spectrogram Transformer for Respiratory Sound Classification

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators