Abstract
Weakly supervised video anomaly detection is a challenging task because frame-level labels are not accessible at the training time. Effectively tackling this task necessitates models to learn discriminative feature representation. To address this challenge, we propose a multi-stage memory-augmented feature discrimination learning (MMFDL) method. The first stage obtains the preliminary abnormal probabilities of clip features. In the second stage, an easy normal pattern memory (ENPM) are proposed to store normal patterns with low abnormal probabilities. In the last stage, we bring clip features with high abnormal probabilities in normal videos close to ENPM and away from the clip features with high probabilities of being abnormal in abnormal videos to make models learn more discriminative features for anomaly detection. Furthermore, we propose a local-and-global temporal relations modeling (LGTRM) module to enhance clip features by aggregating local and global contexts. Our LGTRM module can be divided into two subnetworks: DW-Net and TF-Net. DW-Net integrates the current clip feature with its adjacent clip features to capture local-range temporal dependencies. TF-Net utilizes the multi-head self-attention mechanism of the transformer to capture global-range temporal dependencies. Experiments on two datasets demonstrate that our method outperforms state-of-the-art approaches. The code is available at https://github.com/xuanli01/PRCV347.
This work was supported in part by the National Key Research and Development Program of China under Grant 2020AAA0106502, in part by the Natural Science Foundation of China under Grant 62073105, in part by the Natural Science Foundation of Heilongjiang Province of China under Grant ZD2022F002, and in part by the Heilongjiang Touyan Innovation Team Program.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: NIPS, vol. 26 (2013)
Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: CVPR, pp. 4724–4733 (2017)
Chen, C., et al.: Comprehensive regularization in a bi-directional predictive network for video anomaly detection. In: AAAI, vol. 36, pp. 230–238 (2022)
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: CVPR, pp. 1800–1807 (2017)
Feng, J., Hong, F., Zheng, W.: MIST: multiple instance self-training framework for video anomaly detection. In: CVPR, pp. 14009–14018 (2021)
Li, S., Liu, F., Jiao, L.: Self-training multi-sequence learning with transformer for weakly supervised video anomaly detection. In: AAAI, pp. 1395–1403 (2022)
Liu, W., Luo, W., Lian, D., Gao, S.: Future frame prediction for anomaly detection - a new baseline. In: CVPR, pp. 6536–6545 (2018)
Liu, Y., Liu, J., Zhu, X., Wei, D., Huang, X., Song, L.: Learning task-specific representation for video anomaly detection with spatial-temporal attention. In: ICASSP, pp. 2190–2194 (2022)
Luo, W., et al.: Action unit memory network for weakly supervised temporal action localization. In: CVPR, pp. 9969–9979 (2021)
Lv, H., Zhou, C., Cui, Z., Xu, C., Li, Y., Yang, J.: Localizing anomalies from weakly-labeled videos. IEEE TIP 30, 4505–4515 (2021)
Pu, Y., Wu, X.: Locality-aware attention network with discriminative dynamics learning for weakly supervised anomaly detection. In: IEEE ICME, pp. 1–6 (2022)
Purwanto, D., Chen, Y., Fang, W.: Dance with self-attention: a new look of conditional random fields on anomaly detection in videos. In: ICCV, pp. 173–183 (2021)
Sapkota, H., Yu, Q.: Bayesian nonparametric submodular video partition for robust anomaly detection. In: CVPR, pp. 3212–3221 (2022)
Sultani, W., Chen, C., Shah, M.: Real-world anomaly detection in surveillance videos. In: CVPR, pp. 6479–6488 (2018)
Sun, C., Jia, Y., Hu, Y., Wu, Y.: Scene-aware context reasoning for unsupervised abnormal event detection in videos. In: ACMMM, pp. 184–192 (2020)
Tian, Y., Pang, G., Chen, Y., Singh, R., Verjans, J.W., Carneiro, G.: Weakly-supervised video anomaly detection with robust temporal feature magnitude learning. In: ICCV, pp. 4955–4966 (2021)
Tran, D., Bourdev, L.D., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: ICCV, pp. 4489–4497 (2015)
Vaswani, A., et al.: Attention is all you need. In: NIPS, pp. 5998–6008 (2017)
Wan, B., Fang, Y., Xia, X., Mei, J.: Weakly supervised video anomaly detection via center-guided discriminative learning. In: IEEE ICME, pp. 1–6 (2020)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: CVPR, pp. 7794–7803 (2018)
Wu, P., Liu, J.: Learning causal temporal relation and feature discrimination for anomaly detection. IEEE TIP 30, 3513–3527 (2021)
Yu, J., Lee, Y., Yow, K.C., Jeon, M., Pedrycz, W.: Abnormal event detection and localization via adversarial event prediction. In: IEEE TNNLS, pp. 1–15 (2021)
Zaheer, M.Z., Mahmood, A., Astrid, M., Lee, S.-I.: CLAWS: clustering assisted weakly supervised learning with normalcy suppression for anomalous event detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 358–376. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_22
Zhang, J., Qing, L., Miao, J.: Temporal convolutional network with complementary inner bag loss for weakly supervised anomaly detection. In: ICIP, pp. 4030–4034 (2019)
Zhong, J.X., Li, N., Kong, W., Liu, S., Li, T.H., Li, G.: Graph convolutional label noise cleaner: train a plug-and-play action classifier for anomaly detection. In: CVPR, pp. 1237–1246 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Li, X., Ma, D., Wu, X. (2024). Enhancing Feature Representation for Anomaly Detection via Local-and-Global Temporal Relations and a Multi-stage Memory. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14430. Springer, Singapore. https://doi.org/10.1007/978-981-99-8537-1_10
Download citation
DOI: https://doi.org/10.1007/978-981-99-8537-1_10
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8536-4
Online ISBN: 978-981-99-8537-1
eBook Packages: Computer ScienceComputer Science (R0)