Skip to main content
Log in

Environmental Sound Classification Based on CAR-Transformer Neural Network Model

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

Environment Sound Classification (ESC) has been a challenging task in the audio field due to the different types of ambient sounds involved. In this paper, we propose a method for the ESC tasks based on the CAR-Transformer neural network model, which includes stages of sound sample pre-processing, deep learning-based feature extraction, and classifier classification. We convert the one-dimensional audio signal into two-dimensional Mel Frequency Cepstral Coefficients (MFCC) and use them as the feature map of the audio. The CAR-Transformer model was used for feature extraction, and after dimensionality reduction of the extracted feature map, we use the fully connected layer as a classifier of the feature map to obtain the final results. The method achieves a classification accuracy of 96.91% on the UrbanSound8K dataset, while the number of parameters in the model is only 0.16 M. The results of this paper were compared with other state-of-art research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. S. Abdoli, P. Cardinal, A.L. Koerich, End-to-end environmental sound classification using a 1D convolutional neural network. Expert Syst. Appl. 1(136), 252–263 (2019)

    Article  Google Scholar 

  2. Z. Ali, M. Talha, Innovative method for unsupervised voice activity detection and classification of audio segments. Ieee Access 6, 15494–15504 (2018)

    Article  Google Scholar 

  3. V. Boddapati, A. Petef, J. Rasmusson, L. Lundberg, Classifying environmental sounds using image recognition networks. Procedia Comput. Sci. 112, 2048–2056 (2017)

    Article  Google Scholar 

  4. K. Choi, G. Fazekas, M. Sandler, K. Cho. Transfer learning for music classification and regression tasks. In 18th International Society for Music Information Retrieval Conference, ISMIR 2017. pp. 141–149(2017)

  5. M. Crocco, M. Cristani, A. Trucco, V. Murino, Audio surveillance: a systematic review. ACM Comput. Surv. (CSUR) 48(4), 1–46 (2016)

    Article  Google Scholar 

  6. F. Demir, M. Turkoglu, M. Aslan, A. Sengur, A new pyramidal concatenated CNN approach for environmental sound classification. Appl. Acoust. 170, 107520 (2020)

    Article  Google Scholar 

  7. X. Dong, B. Yin, Y. Cong, Z. Du, X. Huang, Environment sound event classification with a two-stream convolutional neural network. IEEE Access. 8, 125714–125721 (2020)

    Article  Google Scholar 

  8. D. Elliott, C. E. Otero, S. Wyatt, E. Martino. Tiny transformers for environmental sound classification at the edge. arXiv preprint arXiv:2103.12157. (2021)

  9. S. Ewert. Chroma Toolbox: MATLAB implementations for extracting variants of chroma-based audio features. In Proc. ISMIR. (2011)

  10. T. Giannakopoulos, E. Spyrou, S. J. Perantonis. Recognition of urban sound events using deep context-aware feature extractors and handcrafted features. In Artificial Intelligence Applications and Innovations: AIAI 2019 IFIP WG 12.5 International Workshops: MHDW and 5G-PINE 2019, Hersonissos, Crete, Greece, May 24–26, 2019, Proceedings 15. pp. 184–195. Springer International Publishing. (2019)

  11. C. Harte, M. Sandler, M. Gasser. Detecting harmonic change in musical audio. In Proceedings of the 1st ACM workshop on Audio and music computing multimedia. pp. 21–26 (2006, October)

  12. K. He, X. Zhang, S. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778(2016)

  13. Q. Hou, D. Zhou, J. Feng. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 13713–13722(2021)

  14. Z. Huang, C. Liu, H. Fei, W. Li, J. Yu, Y. Cao, Urban sound classification based on 2-order dense convolutional network using dual features. Appl. Acoust. 164, 107243 (2020)

    Article  Google Scholar 

  15. D. N. Jiang, L. Lu, H. J. Zhang, J. H. Tao, L. H. Cai. Music type classification by spectral contrast feature. In Proceedings. IEEE International Conference on Multimedia and Expo. Vol. 1, pp. 113–116(2002, August)

  16. N. Kitaev, Ł. Kaiser, A. Levskaya. Reformer: the efficient transformer. arXiv preprint arXiv:2001.04451.(2020)

  17. J. Lee, Y. Lee, J. Kim, A. Kosiorek, S. Choi, Y. W. Teh. Set transformer: A framework for attention-based permutation-invariant neural networks. In International conference on machine learning. pp. 3744–3753 (2019)

  18. H. Li, S. Ishikawa, Q. Zhao, M. Ebana, H. Yamamoto, J. Huang. Robot navigation and sound based position identification. In 2007 IEEE International Conference on Systems, Man and Cybernetics. pp. 2449–2454(2007)

  19. S. Li, Y. Yao, J. Hu, G. Liu, X. Yao, J. Hu, An ensemble stacked convolutional neural network model for environmental event sound recognition. Appl. Sci. 8(7), 1152 (2018)

    Article  Google Scholar 

  20. J.S. Luz, M.C. Oliveira, F.H. Araujo, D.M. Magalhães, Ensemble of handcrafted and deep features for urban sound classification. Appl. Acoust. 175, 107819 (2021)

    Article  Google Scholar 

  21. F. Medhat, D. Chesmore, J. Robinson, Masked conditional neural networks for sound classification. Appl. Soft Comput. 90, 106073 (2020)

    Article  Google Scholar 

  22. Z. Mushtaq, S.F. Su, Environmental sound classification using a regularized deep convolutional neural network with data augmentation. Appl. Acoust. 167, 107389 (2020)

    Article  Google Scholar 

  23. H. Park, C.D. Yoo, CNN-based learnable gammatone filterbank and equal-loudness normalization for environmental sound classification. IEEE Signal Process. Lett. 27, 411–415 (2020)

    Article  Google Scholar 

  24. N. Parmar, A. Vaswani, J. Uszkoreit, L. Kaiser, N. Shazeer, A. Ku, D. Tran. Image transformer. In International conference on machine learning. pp. 4055–4064. PMLR. (2018, July)

  25. N. Peng, A. Chen, G. Zhou, W. Chen, W. Zhang, J. Liu, F. Ding, Environment sound classification based on visual multi-feature fusion and GRU-AWS. IEEE Access 8, 191100–191114 (2020)

    Article  Google Scholar 

  26. K. J. Piczak. ESC: Dataset for environmental sound classification. In Proceedings of the 23rd ACM international conference on Multimedia. pp. 1015–1018(2015, October)

  27. K. J. Piczak. Environmental sound classification with convolutional neural networks. In 2015 IEEE 25th international workshop on machine learning for signal processing (MLSP). pp. 1–6(2015, September)

  28. J. Salamon, C. Jacoby, J. P. Bello. A dataset and taxonomy for urban sound research. In Proceedings of the 22nd ACM international conference on Multimedia. pp. 1041–1044 (2014, November)

  29. J. Sharma, O. C. Granmo, M. Goodwin. Environment Sound Classification Using Multiple Feature Channels and Attention Based Deep Convolutional Neural Network. In Interspeech. Vol. 2020, pp. 1186–1190(2020, October)

  30. Y. Su, K. Zhang, J. Wang, K. Madani, Environment sound classification using a two-stream CNN based on decision-level fusion. Sensors 19(7), 1733 (2019)

    Article  Google Scholar 

  31. Y. Su, K. Zhang, J. Wang, D. Zhou, K. Madani, Performance analysis of multiple aggregated acoustic features for environment sound classification. Appl. Acoust. 158, 107050 (2020)

    Article  Google Scholar 

  32. Y. Tokozume, T. Harada. Learning environmental sounds with end-to-end convolutional neural network. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). pp. 2721–2725(2017, March)

  33. T. Tuncer, A. Subasi, F. Ertam, S. Dogan, A novel spiral pattern and 2D M4 pooling based environmental sound classification method. Appl. Acoust. 170, 107508 (2020)

    Article  Google Scholar 

  34. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, ..., I. Polosukhin. Attention is all you need. In Advances in neural information processing systems. pp. 5998–6008(2017)

  35. N. Yamakawa, T. Takahashi, T. Kitahara, T. Ogata, H.G. Okuno. Environmental Sound Recognition for Robot Audition Using Matching-Pursuit. In: Mehrotra, K.G., Mohan, C.K., Oh, J.C., Varshney, P.K., Ali, M. (eds.). Modern Approaches in Applied Intelligence. IEA/AIE 2011. Lecture Notes in Computer Science, vol 6704. Springer, Berlin, Heidelberg (2011). Doi: https://doi.org/10.1007/978-3-642-21827-9_1

  36. J. Ye, T. Kobayashi, X. Wang, H. Tsuda, M. Murakawa, Audio data mining for anthropogenic disaster identification: an automatic taxonomy approach. IEEE Trans. Emerg. Top. Comput. 8(1), 126–136 (2017)

    Article  Google Scholar 

  37. H. Zhang, I. Mcloughlin, Y. Song. Robust sound event recognition using convolutional neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing. pp. 559–563 (2015)

  38. Z. Zhang, S. Xu, S. Cao, S. Zhang. Deep convolutional neural network with mixup for environmental sound classification. In Chinese conference on pattern recognition and computer vision (prcv). pp. 356–367. Springer, Cham. (2018, November)

Download references

Acknowledgements

This work was supported by Hunan Key Laboratory of Intelligent Logistics Technology (2019TP1015)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aibin Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, H., Chen, A., Yi, J. et al. Environmental Sound Classification Based on CAR-Transformer Neural Network Model. Circuits Syst Signal Process 42, 5289–5312 (2023). https://doi.org/10.1007/s00034-023-02339-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-023-02339-w

Keywords

Navigation