Predictive AutoEncoders Are Context-Aware Unsupervised Anomalous Sound Detectors

Zeng, Xiao-Min; Song, Yan; Dai, Li-Rong; Liu, Lin

doi:10.1007/978-981-99-2401-1_9

Xiao-Min Zeng⁹,
Yan Song⁹,
Li-Rong Dai⁹ &
…
Lin Liu¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1765))

Included in the following conference series:

National Conference on Man-Machine Speech Communication

386 Accesses

Abstract

In this paper, we propose a Predictive AutoEncoder (PAE) capable of exploiting context information for unsupervised anomalous sound detection (ASD). The conventional unsupervised ASD approaches mainly employ the straightforward deep neural network (DNN) to detect abnormal sounds. However, this model fails to consider the utilization of the relationship between frames, resulting in limited performance and constrained input length. Recently, context information has been proven to be valid for sequence data processing. In our method, the PAE consisting of transformer blocks is proposed to predict unseen frames by remaining available inputs. Based on the self-attention mechanism, our model captures not only content information within the frame but also context information between frames to improve ASD performance. Moreover, our method extends the input length of AE-based models due to its outstanding capability of long-range sequence modeling. The extensive experiments conducted on the DCASE2020 Task2 development dataset demonstrate that our method outperforms the state-of-the-art AE-based methods and verify the effectiveness and stability of our proposed method for long-range temporal inputs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
DCASE: Detection and Classification of Acoustic Scenes and Events, https://dcase.community.

References

Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)
Baevski, A., Zhou, Y., Mohamed, A., Auli, M.: wav2vec 2.0: A framework for self-supervised learning of speech representations. In: Advances in Neural Information Processing Systems 33, pp. 12449–12460 (2020)
Google Scholar
Chen, H., Song, Y., Dai, L.R., McLoughlin, I., Liu, L.: Self-supervised representation learning for unsupervised anomalous sound detection under domain shift. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 471–475. IEEE (2022)
Google Scholar
Chen, S., et al.: Wavlm: Large-scale self-supervised pre-training for full stack speech processing. IEEE J. Selected Topics Signal Process. (2022)
Google Scholar
Chen, S., Liu, Y., Gao, X., Han, Z.: MobileFaceNets: efficient CNNs for accurate real-time face verification on mobile devices. In: Zhou, J., et al. (eds.) CCBR 2018. LNCS, vol. 10996, pp. 428–438. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-97909-0_46
Chapter Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dohi, K., Endo, T., Purohit, H., Tanabe, R., Kawaguchi, Y.: Flow-based self-supervised density estimation for anomalous sound detection. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 336–340 (2021). https://doi.org/10.1109/ICASSP39728.2021.9414662
Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Giri, R., Tenneti, S.V., Cheng, F., Helwani, K., Isik, U., Krishnaswamy, A.: Self-supervised classification for detecting anomalous sounds. In: DCASE, pp. 46–50 (2020)
Google Scholar
Hatanaka, S., Nishi, H.: Efficient gan - based unsupervised anomaly sound detection for refrigeration units. In: 2021 IEEE 30th International Symposium on Industrial Electronics (ISIE), pp. 1–7 (2021). https://doi.org/10.1109/ISIE45552.2021.9576445
Hayashi, T., Yoshimura, T., Adachi, Y.: Conformer-based id-aware autoencoder for unsupervised anomalous sound detection. Tech. rep., DCASE2020 Challenge (July 2020)
Google Scholar
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16000–16009 (2022)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference On Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hendrycks, D., Mazeika, M., Dietterich, T.: Deep anomaly detection with outlier exposure. In: International Conference on Learning Representations (2019)
Google Scholar
Hojjati, H., Armanfard, N.: Self-supervised acoustic anomaly detection via contrastive learning. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3253–3257. IEEE (2022)
Google Scholar
Hsu, W.N., Bolte, B., Tsai, Y.H.H., Lakhotia, K., Salakhutdinov, R., Mohamed, A.: Hubert: Self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 3451–3460 (2021)
Article Google Scholar
Kabir, M.A., Luo, X.: Unsupervised learning for network flow based anomaly detection in the era of deep learning. In: 2020 IEEE Sixth International Conference on Big Data Computing Service and Applications (BigDataService), pp. 165–168. IEEE (2020)
Google Scholar
Kapka, S.: Id-conditioned auto-encoder for unsupervised anomaly detection. In: Proceedings of the Detection and Classification of Acoustic Scenes and Events 2020 Workshop (DCASE2020), pp. 71–75, Tokyo, Japan (November 2020)
Google Scholar
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Koizumi, Y., et al.: Description and discussion on dcase2020 challenge task2: Unsupervised anomalous sound detection for machine condition monitoring. In: Proceedings of the Detection and Classification of Acoustic Scenes and Events 2020 Workshop (DCASE2020), pp. 81–85, Tokyo, Japan (November 2020)
Google Scholar
Koizumi, Y., Saito, S., Uematsu, H., Harada, N., Imoto, K.: ToyADMOS: A dataset of miniature-machine operating sounds for anomalous sound detection. In: Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 308–312 (November 2019). https://ieeexplore.ieee.org/document/8937164
Li, Y., Peng, X., Zhang, J., Li, Z., Wen, M.: Dct-gan: Dilated convolutional transformer-based gan for time series anomaly detection. IEEE Trans. Knowl. Data Eng. (2021)
Google Scholar
Liao, W.L., et al.: Dcase 2021 task 2: Anomalous sound detection using conditional autoencoder and convolutional recurrent neural netw. Tech. rep., DCASE2021 Challenge (July 2021)
Google Scholar
Liu, Y., Guan, J., Zhu, Q., Wang, W.: Anomalous sound detection using spectral-temporal information fusion. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 816–820. IEEE (2022)
Google Scholar
Purohit, H., et al.: MIMII Dataset: Sound dataset for malfunctioning industrial machine investigation and inspection. In: Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019), pp. 209–213 (November 2019)
Google Scholar
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al.: Improving language understanding by generative pre-training (2018)
Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv 2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Google Scholar
Schneider, S., Baevski, A., Collobert, R., Auli, M.: wav2vec: Unsupervised pre-training for speech recognition. arXiv preprint arXiv:1904.05862 (2019)
Suefusa, K., Nishida, T., Purohit, H., Tanabe, R., Endo, T., Kawaguchi, Y.: Anomalous sound detection based on interpolation deep neural network. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 271–275. IEEE (2020)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 30 (2017)
Google Scholar
Wichern, G., Chakrabarty, A., Wang, Z.Q., Le Roux, J.: Anomalous sound detection using attentive neural processes. In: 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 186–190. IEEE (2021)
Google Scholar
Xu, J., Wu, H., Wang, J., Long, M.: Anomaly transformer: Time series anomaly detection with association discrepancy. In: International Conference on Learning Representations (2022)
Google Scholar
Zhou, B., Liu, S., Hooi, B., Cheng, X., Ye, J.: Beatgan: Anomalous rhythm detection using adversarially generated time series. In: IJCAI, pp. 4433–4439 (2019)
Google Scholar

Download references

Acknowledgements

This work was supported by the Leading Plan of CAS (XDC08030200)

Author information

Authors and Affiliations

National Engineering Research Center of Speech and Language Information Processing, University of Science and Technology of China, Hefei, China
Xiao-Min Zeng, Yan Song & Li-Rong Dai
iFLYTEK Research, iFLYTEK CO., LTD., Hefei, China
Lin Liu

Authors

Xiao-Min Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Yan Song
View author publications
You can also search for this author in PubMed Google Scholar
Li-Rong Dai
View author publications
You can also search for this author in PubMed Google Scholar
Lin Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yan Song .

Editor information

Editors and Affiliations

University of Science and Technology of China, Anhui, China
Ling Zhenhua
Hefei University, Anhui, China
Gao Jianqing
Shanghai Jiaotong University, Shanghai, China
Yu Kai
Tsinghua University, Beijing, China
Jia Jia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zeng, XM., Song, Y., Dai, LR., Liu, L. (2023). Predictive AutoEncoders Are Context-Aware Unsupervised Anomalous Sound Detectors. In: Zhenhua, L., Jianqing, G., Kai, Y., Jia, J. (eds) Man-Machine Speech Communication. NCMMSC 2022. Communications in Computer and Information Science, vol 1765. Springer, Singapore. https://doi.org/10.1007/978-981-99-2401-1_9

Download citation

DOI: https://doi.org/10.1007/978-981-99-2401-1_9
Published: 10 May 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-2400-4
Online ISBN: 978-981-99-2401-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics