Abstract
Inconsistent data and unclear labels make it difficult to learn anomalous behavior from video. Therefore, methods based on deep clustering are now trending in this area. A deep clustering strategy usually relies on encoding and reconstruction to facilitate information discovery. However, it seems pointless to reconstruct the input after the model’s learning process is already concluded. On the other hand, multiple input types carry various features which may help identify the problem more accurately. Hence to mitigate the requirement of utilizing assorted features with clustering, we propose Skeletal Based Autoencoder (SKELBA), which allows us to process the different types of inputs parallelly. The model consists of a spatial graph convolution operator, which helps us convolve the skeletal data more precisely. A decoder-less deep clustering architecture is introduced to enhance the stability of clustering. The relation between reconstruction error and minimizing the lower bound of mutual information (MI) helps us look into decoder-free systems. The joint venture of local–global feature collection and decoder-free encoders techniques shows improved results. Extensive experiments performed on the various benchmark datasets highlight the proposed model’s superiority among recently proposed approaches in the same field.
Similar content being viewed by others
Data availability
All datasets used in the research are publicly available and any other information or data will be available on request.
References
Savitha C, Ramesh D (2018) Motion detection in video surviellance: a systematic survey. In 2018 2nd International Conference on Inventive Systems and Control (ICISC), IEEE, pp 51–54
Yan J, Angelini F, Naqvi SM (2020) Image segmentation based privacy-preserving human action recognition for anomaly detection. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp 8931–8935
Hafeez JM, Zeng Yu, Tianrui L, Rajeh Taha M, Fahad R, Syed W (2022) Hybrid two-stream dynamic cnn for view adaptive human action recognition using ensemble learning. Int J Mach Learn Cybern 13:1157
Yu T, Ren Z, Li Y, Yan E, Xu N, Yuan J (2019) Temporal structure mining for weakly supervised action detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 5522–5531
Shean CY, Haur TY (2017) Abnormal event detection in videos using spatiotemporal autoencoder. International symposium on neural networks. Springer, Cham, pp 189–196
Muzamil A, Muhammad R, Ullah KH, Saqib I, Attique KM, Jung-In C, Yunyoung N, Seifedine K (2021) Real-time violent action recognition using key frames extraction and deep learning. Comput Mater Continua 69(2):2217–2230
Markovitz A, Sharir G, Friedman I, Zelnik-Manor L, Avidan S (2020) Graph embedded pose clustering for anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10539–10547
Xuan HH, Zhenlong LL (2021) Deep clustering based on embedded auto-encoder. Soft Comput 27:1075
Wang J, Jiang J (2021) Unsupervised deep clustering via adaptive gmm modeling and optimization. Neurocomputing 433:199–211
Ji Q, Sun Y, Gao J, Hu Y, Yin B (2021) A decoder-free variational deep embedding for unsupervised clustering. IEEE Trans Neural Netw Learn Syst 33(10):5681–93
Okada M, Taniguchi T (2021) Dreaming: model-based reinforcement learning by latent imagination without reconstruction. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp 4209–4215
Hasan M, Choi J, Neumann J, Roy-Chowdhury AK, Davis LS (2016) Learning temporal regularity in video sequences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 733–742
Hung V, Dinh NT, Anthony T, Svetha V, Dinh P (2017) Energy-based localized anomaly detection in video surveillance. Pacific-Asia conference on knowledge discovery and data mining. Springer, Cham, pp 641–653
Serhan C, Giuseppe D, Vania B, Carolina G, Otavio AL, François B (2016) Toward abnormal trajectory and event detection in video surveillance. IEEE Trans Circuits Syst Video Technol 27(3):683–695
Hinami R, Mei T, Satoh S (2017) Joint detection and recounting of abnormal events by learning deep generic knowledge. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3619–3627
Choi S, Kim C, Kang YS, Youm S (2021) Human behavioral pattern analysis-based anomaly detection system in residential space. J Supercomput. 77:9248–65
Jiang Y, Jun X, Zhang T (2020) View-independent representation with frame interpolation method for skeleton-based human action recognition. Int J Mach Learn Cybern 11(12):2625–2636
Liu C, Ying J, Yang H, Xing H, Liu J (2021) Improved human action recognition approach based on two-stream convolutional neural network model. Vis Comput 37(6):1327–1341
Chang Y, Zhigang T, Xie W, Luo B, Zhang S, Sui H, Yuan J (2022) Video anomaly detection with spatio-temporal dissociation. Pattern Recogn 122:108213
Mekthanavanh V, Li T, Meng H, Yang Y, Jie H (2019) Social web video clustering based on multi-view clustering via nonnegative matrix factorization. Int J Mach Learn Cybern 10(10):2779–2790
Chang Y, Zhigang T, Xie W, Luo B, Zhang S, Sui H, Yuan J (2022) Video anomaly detection with spatio-temporal dissociation. Pattern Recogn 122:108213
Haisheng S, Zhao X, Tianwei L (2018) Cascaded pyramid mining network for weakly supervised temporal action localization. Asian conference on computer vision. Springer, Cham, pp 558–574
Oded M, Tomás L-P (1998) A framework for multiple-instance learning. Advances in neural information processing systems. Springer, Cham, pp 570–576
You Q, Jin H, Wang Z, Fang C, Luo J (2016) Image captioning with semantic attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4651–4659
Wang L, Xiong Y, Lin D, Van Gool L (2017) Untrimmednets for weakly supervised action recognition and detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4325–4334
Paul S, Roy S, RCK Amit (2018) W-talc: Weakly-supervised temporal activity localization and classification. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 563–579
Singh KK, Lee YJ (2017) Hide-and-seek: forcing a network to be meticulous for weakly-supervised object and action localization. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 3544–3553IEEE
Nguyen P, Liu T, Prasad G, Han B(2018) Weakly supervised action localization by sparse temporal pooling network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6752–6761
Liu Z, Wang L, Zhang Q, Gao Z, Niu Z, Zheng N, Hua G (2019) Weakly supervised temporal action localization through contrast based evaluation networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 3899–3908
Zhong J-X, Li N, Kong W, Zhang T, Li Thomas H, Li G (2018) Step-by-step erasion, one-by-one collection: a weakly supervised temporal action detector. In: Proceedings of the 26th ACM International Conference on Multimedia, pp 35–44
Liu D, Jiang T, Wang Y (2019) Completeness modeling and context separation for weakly supervised temporal action localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1298–1307
Narayan S, Cholakkal H, Khan F S, Shao L (2019) 3c-net: category count and center loss for weakly-supervised action localization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 8679–8687
Nguyen PX, Ramanan D, Fowlkes CC (2019) Weakly-supervised action localization with background modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 5502–5511
Jianbang G, Peng S, Sang-Bing T (2022) A study on the optimization simulation of big data video image keyframes in motion models. Wirel Commun Mob Comput. https://doi.org/10.1155/2022/2508174
Khan FA, Nawaz M, Imran M, Rahman AU, Qayum F (2021) Foreground detection using motion histogram threshold algorithm in high-resolution large datasets. Multimed Syst 27:667–678
Gong D, Liu L, Le V, Saha B, Mansour MR, Venkatesh S, Hengel van den A (2019) Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1705–1714
Park H, Noh J, Ham B (2020) Learning memory-guided normality for anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14372–1438
Le W, Junwen T, Sanping Z, Haoyue S, Gang H (2023) Memory-augmented appearance-motion network for video anomaly detection. Pattern Recogn 138:109335
Cai R, Zhang H, Liu W, Gao S, Hao Z (2021) Appearance-motion memory consistency network for video anomaly detection. Proc AAAI Conf Artif Intell 35:938–946
Hou J, Zhang Y, Zhong Q, Xie D, Pu S, Zhou H (2021) Divide-and-assemble: learning block-wise memory for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 8791–8800
Prawiro H, Peng J-W, Pan T-Y, Hu M-C(2020) Abnormal event detection in surveillance videos using two-stream decoder. In: 2020 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), IEEE, pp 1–6
Hyun W, Nam W-J, Lee J, Lee S-W (2022) Learning temporal context of normality for unsupervised anomaly detection in videos. In: 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC), IEEE, pp 3261–3266
Lan T, Wang Y, Mori G (2011) Discriminative figure-centric models for joint action localization and recognition. In: 2011 International Conference on Computer Vision, IEEE, pp 2003–2010
Soomro K, Shah M (2017) Unsupervised action discovery and localization in videos. In: Proceedings of the IEEE International Conference on Computer Vision, pp 696–705
Abati D, Porrello A, Calderara S, Cucchiara R (2019) Latent space autoregression for novelty detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 481–490
Luo W, Liu W, Gao S (2017) Remembering history with convolutional lstm for anomaly detection. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), IEEE, pp 439–444
Luo W, Liu W, Gao S (2017) A revisit of sparse coding based anomaly detection in stacked rnn framework. In: Proceedings of the IEEE International Conference on Computer Vision, pp 341–349
Medel JR, Savakis A (2016) Anomaly detection in video using predictive convolutional long short-term memory networks. arXiv preprint arXiv:1612.00390
Sabokrou M, Fayyaz M, Fathy M, Klette R (2017) Deep-cascade: cascading 3D deep neural networks for fast anomaly detection and localization in crowded scenes. IEEE Trans Image Process 26(4):1992–2004
An J, Cho S (2015) Variational autoencoder based anomaly detection using reconstruction probability. Special Lect IE 2(1):1–18
Akçay S, Atapour-Abarghouei A, Breckon TP (2019) Skip-ganomaly: skip connected and adversarially trained encoder-decoder anomaly detection. In: 2019 International Joint Conference on Neural Networks (IJCNN), IEEE, pp 1–8
Lotter W, Kreiman G, Cox D (2015) Unsupervised learning of visual structure using predictive generative networks. arXiv preprint arXiv:1511.06380
Liu W, Luo W, Lian D, Gao S(2018) Future frame prediction for anomaly detection–a new baseline. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6536–6545
Morais R, Le V, Tran T, Saha B, Mansour M, Venkatesh S (2019) Learning regularity in skeleton trajectories for anomaly detection in videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11996–12004
Zhang J, Ye G, Zhigang T, Qin Y, Qin Q, Zhang J, Liu J (2022) A spatial attentive and temporal dilated (satd) gcn for skeleton-based action recognition. CAAI Trans Intell Technol 7(1):46–55
Hjelm RD, Fedorov A, Lavoie-Marchildon S, Grewal K, Bachman P, Trischler A, Bengio Y (2018) Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670
Blei DM, Jordan MI (2006) Variational inference for dirichlet process mixtures. Bayesian Anal 1(1):121–143
Shahroudy A, Liu J, Ng T-T, Wang G (2016) Ntu rgb+ d: a large scale dataset for 3D human activity analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1010–1019
Sultani W, Chen C, Shah M (2018) Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6479–6488
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1026–1034
Mingchao Y, Yonghua X, Jinhua S (2023) Memory clustering autoencoder method for human action anomaly detection on surveillance camera video. IEEE Sens J. https://doi.org/10.1109/JSEN.2023.3239219
Zaheer MZ, Mahmood A, Khan MH, Segu M, Yu F, Lee S-I (2022) Generative cooperative learning for unsupervised video anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14744–14754
Luo W, Liu W, Gao S (2021) Normal graph: spatial temporal graph convolutional networks based prediction network for skeleton based video anomaly detection. Neurocomputing 444:332–337
Zhong J-X, Li N, Kong W, Liu S, Li TH, Li G (2019) Graph convolutional label noise cleaner: train a plug-and-play action classifier for anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1237–1246
Chang Y, Zhigang T, Xie W, Luo B, Zhang S, Sui H, Yuan J (2022) Video anomaly detection with spatio-temporal dissociation. Pattern Recogn 122:108213
Hyun W, Nam W-J, Lee S-W (2023) Dissimilate-and-assimilate strategy for video anomaly detection and localization. Neurocomputing 522:203–213
Feng J-C, Hong F-T, Zheng W-S (2021) Mist: multiple instance self-training framework for video anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14009–14018
Waseem U, Amin U, Ul HI, Khan M, Muhammad S, Wook BS (2021) Cnn features with bi-directional lstm for real-time anomaly detection in surveillance networks. Multimed Tools Appl 80(11):16979–16995
Sultani W, Chen C, Shah M (2018) Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6479–6488
Cao C, Zhang X, Zhang S, Wang P, Zhang Y (2022) Adaptive graph convolutional networks for weakly supervised anomaly detection in videos. IEEE Signal Process Lett 29:2497–2501
Acknowledgements
This research was supported by the National Science Foundation of China (Nos. 62176221, 62276215, 62276216).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Javed, M.H., Yu, Z., Li, T. et al. learning anomalous human actions using frames of interest and decoderless deep embedded clustering. Int. J. Mach. Learn. & Cyber. 14, 3575–3589 (2023). https://doi.org/10.1007/s13042-023-01851-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-023-01851-4