Skip to main content
Log in

learning anomalous human actions using frames of interest and decoderless deep embedded clustering

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Inconsistent data and unclear labels make it difficult to learn anomalous behavior from video. Therefore, methods based on deep clustering are now trending in this area. A deep clustering strategy usually relies on encoding and reconstruction to facilitate information discovery. However, it seems pointless to reconstruct the input after the model’s learning process is already concluded. On the other hand, multiple input types carry various features which may help identify the problem more accurately. Hence to mitigate the requirement of utilizing assorted features with clustering, we propose Skeletal Based Autoencoder (SKELBA), which allows us to process the different types of inputs parallelly. The model consists of a spatial graph convolution operator, which helps us convolve the skeletal data more precisely. A decoder-less deep clustering architecture is introduced to enhance the stability of clustering. The relation between reconstruction error and minimizing the lower bound of mutual information (MI) helps us look into decoder-free systems. The joint venture of local–global feature collection and decoder-free encoders techniques shows improved results. Extensive experiments performed on the various benchmark datasets highlight the proposed model’s superiority among recently proposed approaches in the same field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

All datasets used in the research are publicly available and any other information or data will be available on request.

References

  1. Savitha C, Ramesh D (2018) Motion detection in video surviellance: a systematic survey. In 2018 2nd International Conference on Inventive Systems and Control (ICISC), IEEE, pp 51–54

  2. Yan J, Angelini F, Naqvi SM (2020) Image segmentation based privacy-preserving human action recognition for anomaly detection. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp 8931–8935

  3. Hafeez JM, Zeng Yu, Tianrui L, Rajeh Taha M, Fahad R, Syed W (2022) Hybrid two-stream dynamic cnn for view adaptive human action recognition using ensemble learning. Int J Mach Learn Cybern 13:1157

    Article  Google Scholar 

  4. Yu T, Ren Z, Li Y, Yan E, Xu N, Yuan J (2019) Temporal structure mining for weakly supervised action detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 5522–5531

  5. Shean CY, Haur TY (2017) Abnormal event detection in videos using spatiotemporal autoencoder. International symposium on neural networks. Springer, Cham, pp 189–196

    Google Scholar 

  6. Muzamil A, Muhammad R, Ullah KH, Saqib I, Attique KM, Jung-In C, Yunyoung N, Seifedine K (2021) Real-time violent action recognition using key frames extraction and deep learning. Comput Mater Continua 69(2):2217–2230

    Article  Google Scholar 

  7. Markovitz A, Sharir G, Friedman I, Zelnik-Manor L, Avidan S (2020) Graph embedded pose clustering for anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10539–10547

  8. Xuan HH, Zhenlong LL (2021) Deep clustering based on embedded auto-encoder. Soft Comput 27:1075

    Google Scholar 

  9. Wang J, Jiang J (2021) Unsupervised deep clustering via adaptive gmm modeling and optimization. Neurocomputing 433:199–211

    Article  Google Scholar 

  10. Ji Q, Sun Y, Gao J, Hu Y, Yin B (2021) A decoder-free variational deep embedding for unsupervised clustering. IEEE Trans Neural Netw Learn Syst 33(10):5681–93

    Article  MathSciNet  Google Scholar 

  11. Okada M, Taniguchi T (2021) Dreaming: model-based reinforcement learning by latent imagination without reconstruction. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp 4209–4215

  12. Hasan M, Choi J, Neumann J, Roy-Chowdhury AK, Davis LS (2016) Learning temporal regularity in video sequences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 733–742

  13. Hung V, Dinh NT, Anthony T, Svetha V, Dinh P (2017) Energy-based localized anomaly detection in video surveillance. Pacific-Asia conference on knowledge discovery and data mining. Springer, Cham, pp 641–653

    Google Scholar 

  14. Serhan C, Giuseppe D, Vania B, Carolina G, Otavio AL, François B (2016) Toward abnormal trajectory and event detection in video surveillance. IEEE Trans Circuits Syst Video Technol 27(3):683–695

    Google Scholar 

  15. Hinami R, Mei T, Satoh S (2017) Joint detection and recounting of abnormal events by learning deep generic knowledge. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3619–3627

  16. Choi S, Kim C, Kang YS, Youm S (2021) Human behavioral pattern analysis-based anomaly detection system in residential space. J Supercomput. 77:9248–65

    Article  Google Scholar 

  17. Jiang Y, Jun X, Zhang T (2020) View-independent representation with frame interpolation method for skeleton-based human action recognition. Int J Mach Learn Cybern 11(12):2625–2636

    Article  Google Scholar 

  18. Liu C, Ying J, Yang H, Xing H, Liu J (2021) Improved human action recognition approach based on two-stream convolutional neural network model. Vis Comput 37(6):1327–1341

    Article  Google Scholar 

  19. Chang Y, Zhigang T, Xie W, Luo B, Zhang S, Sui H, Yuan J (2022) Video anomaly detection with spatio-temporal dissociation. Pattern Recogn 122:108213

    Article  Google Scholar 

  20. Mekthanavanh V, Li T, Meng H, Yang Y, Jie H (2019) Social web video clustering based on multi-view clustering via nonnegative matrix factorization. Int J Mach Learn Cybern 10(10):2779–2790

    Article  Google Scholar 

  21. Chang Y, Zhigang T, Xie W, Luo B, Zhang S, Sui H, Yuan J (2022) Video anomaly detection with spatio-temporal dissociation. Pattern Recogn 122:108213

    Article  Google Scholar 

  22. Haisheng S, Zhao X, Tianwei L (2018) Cascaded pyramid mining network for weakly supervised temporal action localization. Asian conference on computer vision. Springer, Cham, pp 558–574

    Google Scholar 

  23. Oded M, Tomás L-P (1998) A framework for multiple-instance learning. Advances in neural information processing systems. Springer, Cham, pp 570–576

    Google Scholar 

  24. You Q, Jin H, Wang Z, Fang C, Luo J (2016) Image captioning with semantic attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4651–4659

  25. Wang L, Xiong Y, Lin D, Van Gool L (2017) Untrimmednets for weakly supervised action recognition and detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4325–4334

  26. Paul S, Roy S, RCK Amit (2018) W-talc: Weakly-supervised temporal activity localization and classification. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 563–579

  27. Singh KK, Lee YJ (2017) Hide-and-seek: forcing a network to be meticulous for weakly-supervised object and action localization. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 3544–3553IEEE

  28. Nguyen P, Liu T, Prasad G, Han B(2018) Weakly supervised action localization by sparse temporal pooling network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6752–6761

  29. Liu Z, Wang L, Zhang Q, Gao Z, Niu Z, Zheng N, Hua G (2019) Weakly supervised temporal action localization through contrast based evaluation networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 3899–3908

  30. Zhong J-X, Li N, Kong W, Zhang T, Li Thomas H, Li G (2018) Step-by-step erasion, one-by-one collection: a weakly supervised temporal action detector. In: Proceedings of the 26th ACM International Conference on Multimedia, pp 35–44

  31. Liu D, Jiang T, Wang Y (2019) Completeness modeling and context separation for weakly supervised temporal action localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1298–1307

  32. Narayan S, Cholakkal H, Khan F S, Shao L (2019) 3c-net: category count and center loss for weakly-supervised action localization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 8679–8687

  33. Nguyen PX, Ramanan D, Fowlkes CC (2019) Weakly-supervised action localization with background modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 5502–5511

  34. Jianbang G, Peng S, Sang-Bing T (2022) A study on the optimization simulation of big data video image keyframes in motion models. Wirel Commun Mob Comput. https://doi.org/10.1155/2022/2508174

    Article  Google Scholar 

  35. Khan FA, Nawaz M, Imran M, Rahman AU, Qayum F (2021) Foreground detection using motion histogram threshold algorithm in high-resolution large datasets. Multimed Syst 27:667–678

    Article  Google Scholar 

  36. Gong D, Liu L, Le V, Saha B, Mansour MR, Venkatesh S, Hengel van den A (2019) Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1705–1714

  37. Park H, Noh J, Ham B (2020) Learning memory-guided normality for anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14372–1438

  38. Le W, Junwen T, Sanping Z, Haoyue S, Gang H (2023) Memory-augmented appearance-motion network for video anomaly detection. Pattern Recogn 138:109335

    Article  Google Scholar 

  39. Cai R, Zhang H, Liu W, Gao S, Hao Z (2021) Appearance-motion memory consistency network for video anomaly detection. Proc AAAI Conf Artif Intell 35:938–946

    Google Scholar 

  40. Hou J, Zhang Y, Zhong Q, Xie D, Pu S, Zhou H (2021) Divide-and-assemble: learning block-wise memory for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 8791–8800

  41. Prawiro H, Peng J-W, Pan T-Y, Hu M-C(2020) Abnormal event detection in surveillance videos using two-stream decoder. In: 2020 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), IEEE, pp 1–6

  42. Hyun W, Nam W-J, Lee J, Lee S-W (2022) Learning temporal context of normality for unsupervised anomaly detection in videos. In: 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC), IEEE, pp 3261–3266

  43. Lan T, Wang Y, Mori G (2011) Discriminative figure-centric models for joint action localization and recognition. In: 2011 International Conference on Computer Vision, IEEE, pp 2003–2010

  44. Soomro K, Shah M (2017) Unsupervised action discovery and localization in videos. In: Proceedings of the IEEE International Conference on Computer Vision, pp 696–705

  45. Abati D, Porrello A, Calderara S, Cucchiara R (2019) Latent space autoregression for novelty detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 481–490

  46. Luo W, Liu W, Gao S (2017) Remembering history with convolutional lstm for anomaly detection. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), IEEE, pp 439–444

  47. Luo W, Liu W, Gao S (2017) A revisit of sparse coding based anomaly detection in stacked rnn framework. In: Proceedings of the IEEE International Conference on Computer Vision, pp 341–349

  48. Medel JR, Savakis A (2016) Anomaly detection in video using predictive convolutional long short-term memory networks. arXiv preprint arXiv:1612.00390

  49. Sabokrou M, Fayyaz M, Fathy M, Klette R (2017) Deep-cascade: cascading 3D deep neural networks for fast anomaly detection and localization in crowded scenes. IEEE Trans Image Process 26(4):1992–2004

    Article  MathSciNet  MATH  Google Scholar 

  50. An J, Cho S (2015) Variational autoencoder based anomaly detection using reconstruction probability. Special Lect IE 2(1):1–18

    Google Scholar 

  51. Akçay S, Atapour-Abarghouei A, Breckon TP (2019) Skip-ganomaly: skip connected and adversarially trained encoder-decoder anomaly detection. In: 2019 International Joint Conference on Neural Networks (IJCNN), IEEE, pp 1–8

  52. Lotter W, Kreiman G, Cox D (2015) Unsupervised learning of visual structure using predictive generative networks. arXiv preprint arXiv:1511.06380

  53. Liu W, Luo W, Lian D, Gao S(2018) Future frame prediction for anomaly detection–a new baseline. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6536–6545

  54. Morais R, Le V, Tran T, Saha B, Mansour M, Venkatesh S (2019) Learning regularity in skeleton trajectories for anomaly detection in videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11996–12004

  55. Zhang J, Ye G, Zhigang T, Qin Y, Qin Q, Zhang J, Liu J (2022) A spatial attentive and temporal dilated (satd) gcn for skeleton-based action recognition. CAAI Trans Intell Technol 7(1):46–55

    Article  Google Scholar 

  56. Hjelm RD, Fedorov A, Lavoie-Marchildon S, Grewal K, Bachman P, Trischler A, Bengio Y (2018) Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670

  57. Blei DM, Jordan MI (2006) Variational inference for dirichlet process mixtures. Bayesian Anal 1(1):121–143

    Article  MathSciNet  MATH  Google Scholar 

  58. Shahroudy A, Liu J, Ng T-T, Wang G (2016) Ntu rgb+ d: a large scale dataset for 3D human activity analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1010–1019

  59. Sultani W, Chen C, Shah M (2018) Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6479–6488

  60. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1026–1034

  61. Mingchao Y, Yonghua X, Jinhua S (2023) Memory clustering autoencoder method for human action anomaly detection on surveillance camera video. IEEE Sens J. https://doi.org/10.1109/JSEN.2023.3239219

    Article  Google Scholar 

  62. Zaheer MZ, Mahmood A, Khan MH, Segu M, Yu F, Lee S-I (2022) Generative cooperative learning for unsupervised video anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14744–14754

  63. Luo W, Liu W, Gao S (2021) Normal graph: spatial temporal graph convolutional networks based prediction network for skeleton based video anomaly detection. Neurocomputing 444:332–337

    Article  Google Scholar 

  64. Zhong J-X, Li N, Kong W, Liu S, Li TH, Li G (2019) Graph convolutional label noise cleaner: train a plug-and-play action classifier for anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1237–1246

  65. Chang Y, Zhigang T, Xie W, Luo B, Zhang S, Sui H, Yuan J (2022) Video anomaly detection with spatio-temporal dissociation. Pattern Recogn 122:108213

    Article  Google Scholar 

  66. Hyun W, Nam W-J, Lee S-W (2023) Dissimilate-and-assimilate strategy for video anomaly detection and localization. Neurocomputing 522:203–213

    Article  Google Scholar 

  67. Feng J-C, Hong F-T, Zheng W-S (2021) Mist: multiple instance self-training framework for video anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14009–14018

  68. Waseem U, Amin U, Ul HI, Khan M, Muhammad S, Wook BS (2021) Cnn features with bi-directional lstm for real-time anomaly detection in surveillance networks. Multimed Tools Appl 80(11):16979–16995

    Article  Google Scholar 

  69. Sultani W, Chen C, Shah M (2018) Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6479–6488

  70. Cao C, Zhang X, Zhang S, Wang P, Zhang Y (2022) Adaptive graph convolutional networks for weakly supervised anomaly detection in videos. IEEE Signal Process Lett 29:2497–2501

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by the National Science Foundation of China (Nos. 62176221, 62276215, 62276216).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tianrui Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Javed, M.H., Yu, Z., Li, T. et al. learning anomalous human actions using frames of interest and decoderless deep embedded clustering. Int. J. Mach. Learn. & Cyber. 14, 3575–3589 (2023). https://doi.org/10.1007/s13042-023-01851-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-023-01851-4

Navigation