Enhancing Feature Representation for Anomaly Detection via Local-and-Global Temporal Relations and a Multi-stage Memory

Li, Xuan; Ma, Ding; Wu, Xiangqian

doi:10.1007/978-981-99-8537-1_10

Xuan Li¹⁵,
Ding Ma¹⁵ &
Xiangqian Wu¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14430))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

345 Accesses

Abstract

Weakly supervised video anomaly detection is a challenging task because frame-level labels are not accessible at the training time. Effectively tackling this task necessitates models to learn discriminative feature representation. To address this challenge, we propose a multi-stage memory-augmented feature discrimination learning (MMFDL) method. The first stage obtains the preliminary abnormal probabilities of clip features. In the second stage, an easy normal pattern memory (ENPM) are proposed to store normal patterns with low abnormal probabilities. In the last stage, we bring clip features with high abnormal probabilities in normal videos close to ENPM and away from the clip features with high probabilities of being abnormal in abnormal videos to make models learn more discriminative features for anomaly detection. Furthermore, we propose a local-and-global temporal relations modeling (LGTRM) module to enhance clip features by aggregating local and global contexts. Our LGTRM module can be divided into two subnetworks: DW-Net and TF-Net. DW-Net integrates the current clip feature with its adjacent clip features to capture local-range temporal dependencies. TF-Net utilizes the multi-head self-attention mechanism of the transformer to capture global-range temporal dependencies. Experiments on two datasets demonstrate that our method outperforms state-of-the-art approaches. The code is available at https://github.com/xuanli01/PRCV347.

This work was supported in part by the National Key Research and Development Program of China under Grant 2020AAA0106502, in part by the Natural Science Foundation of China under Grant 62073105, in part by the Natural Science Foundation of Heilongjiang Province of China under Grant ZD2022F002, and in part by the Heilongjiang Touyan Innovation Team Program.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: NIPS, vol. 26 (2013)
Google Scholar
Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: CVPR, pp. 4724–4733 (2017)
Google Scholar
Chen, C., et al.: Comprehensive regularization in a bi-directional predictive network for video anomaly detection. In: AAAI, vol. 36, pp. 230–238 (2022)
Google Scholar
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: CVPR, pp. 1800–1807 (2017)
Google Scholar
Feng, J., Hong, F., Zheng, W.: MIST: multiple instance self-training framework for video anomaly detection. In: CVPR, pp. 14009–14018 (2021)
Google Scholar
Li, S., Liu, F., Jiao, L.: Self-training multi-sequence learning with transformer for weakly supervised video anomaly detection. In: AAAI, pp. 1395–1403 (2022)
Google Scholar
Liu, W., Luo, W., Lian, D., Gao, S.: Future frame prediction for anomaly detection - a new baseline. In: CVPR, pp. 6536–6545 (2018)
Google Scholar
Liu, Y., Liu, J., Zhu, X., Wei, D., Huang, X., Song, L.: Learning task-specific representation for video anomaly detection with spatial-temporal attention. In: ICASSP, pp. 2190–2194 (2022)
Google Scholar
Luo, W., et al.: Action unit memory network for weakly supervised temporal action localization. In: CVPR, pp. 9969–9979 (2021)
Google Scholar
Lv, H., Zhou, C., Cui, Z., Xu, C., Li, Y., Yang, J.: Localizing anomalies from weakly-labeled videos. IEEE TIP 30, 4505–4515 (2021)
Google Scholar
Pu, Y., Wu, X.: Locality-aware attention network with discriminative dynamics learning for weakly supervised anomaly detection. In: IEEE ICME, pp. 1–6 (2022)
Google Scholar
Purwanto, D., Chen, Y., Fang, W.: Dance with self-attention: a new look of conditional random fields on anomaly detection in videos. In: ICCV, pp. 173–183 (2021)
Google Scholar
Sapkota, H., Yu, Q.: Bayesian nonparametric submodular video partition for robust anomaly detection. In: CVPR, pp. 3212–3221 (2022)
Google Scholar
Sultani, W., Chen, C., Shah, M.: Real-world anomaly detection in surveillance videos. In: CVPR, pp. 6479–6488 (2018)
Google Scholar
Sun, C., Jia, Y., Hu, Y., Wu, Y.: Scene-aware context reasoning for unsupervised abnormal event detection in videos. In: ACMMM, pp. 184–192 (2020)
Google Scholar
Tian, Y., Pang, G., Chen, Y., Singh, R., Verjans, J.W., Carneiro, G.: Weakly-supervised video anomaly detection with robust temporal feature magnitude learning. In: ICCV, pp. 4955–4966 (2021)
Google Scholar
Tran, D., Bourdev, L.D., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: ICCV, pp. 4489–4497 (2015)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NIPS, pp. 5998–6008 (2017)
Google Scholar
Wan, B., Fang, Y., Xia, X., Mei, J.: Weakly supervised video anomaly detection via center-guided discriminative learning. In: IEEE ICME, pp. 1–6 (2020)
Google Scholar
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: CVPR, pp. 7794–7803 (2018)
Google Scholar
Wu, P., Liu, J.: Learning causal temporal relation and feature discrimination for anomaly detection. IEEE TIP 30, 3513–3527 (2021)
Google Scholar
Yu, J., Lee, Y., Yow, K.C., Jeon, M., Pedrycz, W.: Abnormal event detection and localization via adversarial event prediction. In: IEEE TNNLS, pp. 1–15 (2021)
Google Scholar
Zaheer, M.Z., Mahmood, A., Astrid, M., Lee, S.-I.: CLAWS: clustering assisted weakly supervised learning with normalcy suppression for anomalous event detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 358–376. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_22
Chapter Google Scholar
Zhang, J., Qing, L., Miao, J.: Temporal convolutional network with complementary inner bag loss for weakly supervised anomaly detection. In: ICIP, pp. 4030–4034 (2019)
Google Scholar
Zhong, J.X., Li, N., Kong, W., Liu, S., Li, T.H., Li, G.: Graph convolutional label noise cleaner: train a plug-and-play action classifier for anomaly detection. In: CVPR, pp. 1237–1246 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computing, Harbin Institute of Technology, Harbin, China
Xuan Li, Ding Ma & Xiangqian Wu

Authors

Xuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Ding Ma
View author publications
You can also search for this author in PubMed Google Scholar
Xiangqian Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiangqian Wu .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, X., Ma, D., Wu, X. (2024). Enhancing Feature Representation for Anomaly Detection via Local-and-Global Temporal Relations and a Multi-stage Memory. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14430. Springer, Singapore. https://doi.org/10.1007/978-981-99-8537-1_10

Download citation

DOI: https://doi.org/10.1007/978-981-99-8537-1_10
Published: 26 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8536-4
Online ISBN: 978-981-99-8537-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Enhancing Feature Representation for Anomaly Detection via Local-and-Global Temporal Relations and a Multi-stage Memory