Environmental Sound Classification Based on CAR-Transformer Neural Network Model

Li, Huaicheng; Chen, Aibin; Yi, Jizheng; Chen, Wenjie; Yang, Daowu; Zhou, Guoxiong; Peng, Weixiong

doi:10.1007/s00034-023-02339-w

Environmental Sound Classification Based on CAR-Transformer Neural Network Model

Published: 28 April 2023

Volume 42, pages 5289–5312, (2023)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Huaicheng Li¹,
Aibin Chen ORCID: orcid.org/0000-0001-8917-8398¹,
Jizheng Yi¹,
Wenjie Chen¹,
Daowu Yang¹,
Guoxiong Zhou¹ &
…
Weixiong Peng²

359 Accesses
1 Citation
Explore all metrics

Abstract

Environment Sound Classification (ESC) has been a challenging task in the audio field due to the different types of ambient sounds involved. In this paper, we propose a method for the ESC tasks based on the CAR-Transformer neural network model, which includes stages of sound sample pre-processing, deep learning-based feature extraction, and classifier classification. We convert the one-dimensional audio signal into two-dimensional Mel Frequency Cepstral Coefficients (MFCC) and use them as the feature map of the audio. The CAR-Transformer model was used for feature extraction, and after dimensionality reduction of the extracted feature map, we use the fully connected layer as a classifier of the feature map to obtain the final results. The method achieves a classification accuracy of 96.91% on the UrbanSound8K dataset, while the number of parameters in the model is only 0.16 M. The results of this paper were compared with other state-of-art research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Sound Classification Using Residual Convolutional Network

Environmental Sound Classification Using Neural Network and Deep Learning

Deep Convolutional Neural Network with Mixup for Environmental Sound Classification

References

S. Abdoli, P. Cardinal, A.L. Koerich, End-to-end environmental sound classification using a 1D convolutional neural network. Expert Syst. Appl. 1(136), 252–263 (2019)
Article Google Scholar
Z. Ali, M. Talha, Innovative method for unsupervised voice activity detection and classification of audio segments. Ieee Access 6, 15494–15504 (2018)
Article Google Scholar
V. Boddapati, A. Petef, J. Rasmusson, L. Lundberg, Classifying environmental sounds using image recognition networks. Procedia Comput. Sci. 112, 2048–2056 (2017)
Article Google Scholar
K. Choi, G. Fazekas, M. Sandler, K. Cho. Transfer learning for music classification and regression tasks. In 18th International Society for Music Information Retrieval Conference, ISMIR 2017. pp. 141–149(2017)
M. Crocco, M. Cristani, A. Trucco, V. Murino, Audio surveillance: a systematic review. ACM Comput. Surv. (CSUR) 48(4), 1–46 (2016)
Article Google Scholar
F. Demir, M. Turkoglu, M. Aslan, A. Sengur, A new pyramidal concatenated CNN approach for environmental sound classification. Appl. Acoust. 170, 107520 (2020)
Article Google Scholar
X. Dong, B. Yin, Y. Cong, Z. Du, X. Huang, Environment sound event classification with a two-stream convolutional neural network. IEEE Access. 8, 125714–125721 (2020)
Article Google Scholar
D. Elliott, C. E. Otero, S. Wyatt, E. Martino. Tiny transformers for environmental sound classification at the edge. arXiv preprint arXiv:2103.12157. (2021)
S. Ewert. Chroma Toolbox: MATLAB implementations for extracting variants of chroma-based audio features. In Proc. ISMIR. (2011)
T. Giannakopoulos, E. Spyrou, S. J. Perantonis. Recognition of urban sound events using deep context-aware feature extractors and handcrafted features. In Artificial Intelligence Applications and Innovations: AIAI 2019 IFIP WG 12.5 International Workshops: MHDW and 5G-PINE 2019, Hersonissos, Crete, Greece, May 24–26, 2019, Proceedings 15. pp. 184–195. Springer International Publishing. (2019)
C. Harte, M. Sandler, M. Gasser. Detecting harmonic change in musical audio. In Proceedings of the 1st ACM workshop on Audio and music computing multimedia. pp. 21–26 (2006, October)
K. He, X. Zhang, S. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778(2016)
Q. Hou, D. Zhou, J. Feng. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 13713–13722(2021)
Z. Huang, C. Liu, H. Fei, W. Li, J. Yu, Y. Cao, Urban sound classification based on 2-order dense convolutional network using dual features. Appl. Acoust. 164, 107243 (2020)
Article Google Scholar
D. N. Jiang, L. Lu, H. J. Zhang, J. H. Tao, L. H. Cai. Music type classification by spectral contrast feature. In Proceedings. IEEE International Conference on Multimedia and Expo. Vol. 1, pp. 113–116(2002, August)
N. Kitaev, Ł. Kaiser, A. Levskaya. Reformer: the efficient transformer. arXiv preprint arXiv:2001.04451.(2020)
J. Lee, Y. Lee, J. Kim, A. Kosiorek, S. Choi, Y. W. Teh. Set transformer: A framework for attention-based permutation-invariant neural networks. In International conference on machine learning. pp. 3744–3753 (2019)
H. Li, S. Ishikawa, Q. Zhao, M. Ebana, H. Yamamoto, J. Huang. Robot navigation and sound based position identification. In 2007 IEEE International Conference on Systems, Man and Cybernetics. pp. 2449–2454(2007)
S. Li, Y. Yao, J. Hu, G. Liu, X. Yao, J. Hu, An ensemble stacked convolutional neural network model for environmental event sound recognition. Appl. Sci. 8(7), 1152 (2018)
Article Google Scholar
J.S. Luz, M.C. Oliveira, F.H. Araujo, D.M. Magalhães, Ensemble of handcrafted and deep features for urban sound classification. Appl. Acoust. 175, 107819 (2021)
Article Google Scholar
F. Medhat, D. Chesmore, J. Robinson, Masked conditional neural networks for sound classification. Appl. Soft Comput. 90, 106073 (2020)
Article Google Scholar
Z. Mushtaq, S.F. Su, Environmental sound classification using a regularized deep convolutional neural network with data augmentation. Appl. Acoust. 167, 107389 (2020)
Article Google Scholar
H. Park, C.D. Yoo, CNN-based learnable gammatone filterbank and equal-loudness normalization for environmental sound classification. IEEE Signal Process. Lett. 27, 411–415 (2020)
Article Google Scholar
N. Parmar, A. Vaswani, J. Uszkoreit, L. Kaiser, N. Shazeer, A. Ku, D. Tran. Image transformer. In International conference on machine learning. pp. 4055–4064. PMLR. (2018, July)
N. Peng, A. Chen, G. Zhou, W. Chen, W. Zhang, J. Liu, F. Ding, Environment sound classification based on visual multi-feature fusion and GRU-AWS. IEEE Access 8, 191100–191114 (2020)
Article Google Scholar
K. J. Piczak. ESC: Dataset for environmental sound classification. In Proceedings of the 23rd ACM international conference on Multimedia. pp. 1015–1018(2015, October)
K. J. Piczak. Environmental sound classification with convolutional neural networks. In 2015 IEEE 25th international workshop on machine learning for signal processing (MLSP). pp. 1–6(2015, September)
J. Salamon, C. Jacoby, J. P. Bello. A dataset and taxonomy for urban sound research. In Proceedings of the 22nd ACM international conference on Multimedia. pp. 1041–1044 (2014, November)
J. Sharma, O. C. Granmo, M. Goodwin. Environment Sound Classification Using Multiple Feature Channels and Attention Based Deep Convolutional Neural Network. In Interspeech. Vol. 2020, pp. 1186–1190(2020, October)
Y. Su, K. Zhang, J. Wang, K. Madani, Environment sound classification using a two-stream CNN based on decision-level fusion. Sensors 19(7), 1733 (2019)
Article Google Scholar
Y. Su, K. Zhang, J. Wang, D. Zhou, K. Madani, Performance analysis of multiple aggregated acoustic features for environment sound classification. Appl. Acoust. 158, 107050 (2020)
Article Google Scholar
Y. Tokozume, T. Harada. Learning environmental sounds with end-to-end convolutional neural network. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). pp. 2721–2725(2017, March)
T. Tuncer, A. Subasi, F. Ertam, S. Dogan, A novel spiral pattern and 2D M4 pooling based environmental sound classification method. Appl. Acoust. 170, 107508 (2020)
Article Google Scholar
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, ..., I. Polosukhin. Attention is all you need. In Advances in neural information processing systems. pp. 5998–6008(2017)
N. Yamakawa, T. Takahashi, T. Kitahara, T. Ogata, H.G. Okuno. Environmental Sound Recognition for Robot Audition Using Matching-Pursuit. In: Mehrotra, K.G., Mohan, C.K., Oh, J.C., Varshney, P.K., Ali, M. (eds.). Modern Approaches in Applied Intelligence. IEA/AIE 2011. Lecture Notes in Computer Science, vol 6704. Springer, Berlin, Heidelberg (2011). Doi: https://doi.org/10.1007/978-3-642-21827-9_1
J. Ye, T. Kobayashi, X. Wang, H. Tsuda, M. Murakawa, Audio data mining for anthropogenic disaster identification: an automatic taxonomy approach. IEEE Trans. Emerg. Top. Comput. 8(1), 126–136 (2017)
Article Google Scholar
H. Zhang, I. Mcloughlin, Y. Song. Robust sound event recognition using convolutional neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing. pp. 559–563 (2015)
Z. Zhang, S. Xu, S. Cao, S. Zhang. Deep convolutional neural network with mixup for environmental sound classification. In Chinese conference on pattern recognition and computer vision (prcv). pp. 356–367. Springer, Cham. (2018, November)

Download references

Acknowledgements

This work was supported by Hunan Key Laboratory of Intelligent Logistics Technology (2019TP1015)

Author information

Authors and Affiliations

College of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha, Hunan, China
Huaicheng Li, Aibin Chen, Jizheng Yi, Wenjie Chen, Daowu Yang & Guoxiong Zhou
Hunan Zixing Artificial Intelligence Technology Group Co., Ltd, Changsha, Hunan, China
Weixiong Peng

Authors

Huaicheng Li
View author publications
You can also search for this author in PubMed Google Scholar
Aibin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jizheng Yi
View author publications
You can also search for this author in PubMed Google Scholar
Wenjie Chen
View author publications
You can also search for this author in PubMed Google Scholar
Daowu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Guoxiong Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Weixiong Peng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aibin Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, H., Chen, A., Yi, J. et al. Environmental Sound Classification Based on CAR-Transformer Neural Network Model. Circuits Syst Signal Process 42, 5289–5312 (2023). https://doi.org/10.1007/s00034-023-02339-w

Download citation

Received: 03 March 2022
Revised: 28 February 2023
Accepted: 28 February 2023
Published: 28 April 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s00034-023-02339-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Environmental Sound Classification Based on CAR-Transformer Neural Network Model

Abstract

Access this article

Similar content being viewed by others

Sound Classification Using Residual Convolutional Network

Environmental Sound Classification Using Neural Network and Deep Learning

Deep Convolutional Neural Network with Mixup for Environmental Sound Classification

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Environmental Sound Classification Based on CAR-Transformer Neural Network Model

Abstract

Access this article

Similar content being viewed by others

Sound Classification Using Residual Convolutional Network

Environmental Sound Classification Using Neural Network and Deep Learning

Deep Convolutional Neural Network with Mixup for Environmental Sound Classification

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation