Abstract
Three-dimensional convolutional neural networks (3D-CNNs) and full connection long short-term memory networks (FC-LSTMs) have been demonstrated as a kind of powerful non-intrusive approaches in fall detection. However, the feature extration of 3D-CNN-based requires a large-scale dataset. Meanwhile, the deployment of FC-LSTM to expand the input into one-dimension leads to the loss of spatial information. To this end, a novel model combined lightweight 3D-CNN and convolutional long short-term memory (ConvLSTM) networks is proposed in this paper. In this model, a lightweight 3D convolutional neural network with five layers is presented to avoid the phenomenon of over-fitting. To further explore the discrimnative features, the channel- and spatial-wise attention modules are adopted in each layer to improve the detection performance. In addition, the ConvLSTM is presented to extract the long-term spatial–temporal features of 3D tensors. Finally, we verify our model through extensive experiments by comprehensive comparisons with HMDB5, UCF11, URFD, and MCFD. Experimental results on the public benchmarks demonstrate that our method outperforms current state-of-the-art single-stream networks with 62.55 ± 7.99% on HMDB5, 97.28 ± 0.36% on UCF11, 98.06 ± 0.32% on URFD, and 94.84 ± 4.64% on MCFD.
Similar content being viewed by others
Data availability
Some or all data, models, or code generated or used during the study are available from the corresponding author by request.
References
Yang L, Ren Y, Hu H, Tian B (2015) New fast fall detection method based on spatio-temporal context tracking of head by using depth images. Sensors 15(1):23004–23019
Burns E, Kakara R (2018) Deaths from falls among persons aged ≥ 65 years-United States, 2007–2016. Morb Mortal Weekly Rep 67(18):509–514
Lord SR, Menz HB, Catherine S (2006) Home environment risk factors for falls in older people and the efficacy of home modifications. Age Ageing 35(2):55–59
Vallabh P, Malekian R (2018) Fall detection monitoring systems: a comprehensive review. J Ambient Intell Humanized Comput 9(6):1809–1833
Makhlouf A, Boudouane I, Saadia N, Ramdane Cherif A (2019) Ambient assistance service for fall and heart problem detection. J Amb Intel Hum Comput 10(4):1527–1546
Shrivastava R, Pandey M (2020) Real time fall detection in fog computing scenario. Cluster Comput 23(4):2861–2870
Islam MM, Rahaman A, Islam MR (2020) Development of smart healthcare monitoring system in IoT environment. SN Comput Sci 1(3):185–197
R. Wang, Y. Zhang, L. Dong, J. Lu, and X. He, (2015) “Fall detection algorithm for the elderly based on human characteristic matrix and SVM,” In: Proc. 15th Int. Conf. Control, Autom. Syst. (ICCAS 2015), Busan, South Korea, Oct., pp. 1190–1195.
Eduardo C, Lora-Rivera Rl, García-Lagos F (2020) A study on the application of convolutional neural networks to fall detection evaluated with multiple public datasets. Sensors 20(5):1466–1479
Villaseor LM, Ponce H (2020) Design and analysis for fall detection system simplification. J Vis Exp 1(1):158–164
Luna-Perejón F, Domínguez-Morales MJ, Civit-Balcells A (2019) Wearable fall detector using recurrent neural networks. Sensors 19(22):4885–4883
Wang G, Li Q, Wang L, Zhang Y, Liu Z (2019) Elderly fall detection with an accelerometer using lightweight neural networks. Electronics 8(11):1354–1373
Khraief C, Benzarti F, Amiri H (2020) Elderly fall detection based on multi-stream deep convolutional networks. Multimedia Tools Appl 79(27–28):19537–19560
Chhetri S, Alsadoon A, In T, Prasad PWC, Rashid TA, Maag A (2021) Deep learning for vision-based fall detection system: enhanced optical dynamic flow. Comput Intell 37(1):578–595
Khan S, Nogas J, Mihailidis A (2021) Spatio-temporal adversarial learning for detecting unseen falls. Pattern Anal Appl 24(1):191–381
Merrouche F, Baha N (2020) Fall detection based on shape deformation. Multimed Tools Appl 79(1):30489–30508
Liu J, Xia Y, Tang Z (2021) Privacy-preserving video fall detection using visual shielding information. Vis Comput 37(1):359–370
Li S, Song X, Xu S, Qi H, Xue Y (2022) Dilated spatial-temporal convolutional auto-encoders for human fall detection in surveillance videos. ICT Exp 9(4):734–740
Xiong X, Min W, Zheng WS, Liao P, Yang H, Wang S (2020) S3DCNN: skeleton-based 3D consecutive-low-pooling neural network for fall detection. Int J Speech Technol 50(10):3521–3534
S. Jeong, S. Kang, and I. Chun, (2019) “Human-skeleton based fall-detection method using LSTM for manufacturing industries,” In: Proc. the 2019 34th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC 2019), JeJu, Korea, pp. 1–4.
Xu Q, Huang G, Yu M, Guo Y (2020) Fall prediction based on key points of human bones. Phys A 540:382
Ramirez H, Velastin SA, Meza I, Fabregas E, Makris D, Farias G (2021) Fall detection and activity recognition using human skeleton features. IEEE Access 9(1):33532–33542
Martínez-Villaseor L et al (2019) UP-fall detection dataset: a multimodal approach. Sensors 19(9):1988
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE T Pattern Anal 35(8):1798–1828
Fukushima K (1980) Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern 36(4):193–202
Haut JM, Paoletti ME, Plaza J, Plaza A, Li J (2019) Visual attention-driven hyperspectral image classifification. IEEE T Geosci Remote 57(10):8065–8080
Hinton GE, Salakhutdinov R (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Núñez-Marcos A, Azkune G, Arganda-Carreras I (2017) Vision-based fall detection with convolutional neural networks. Wirel Commun Mob Com 2017(1):1–16
Guan Y, Hu W, Hu X (2021) Abnormal behavior recognition using 3D-CNN combined with LSTM. Multimed Tools Appl 80(8):18787–18801
C. Feichtenhofer, H. Fan, J. Malik, and K. He, (2019) “Slow fast networks for video recognition,” In: Proc. the 2019 IEEE/CVF 17th International Conference on Computer Vision (ICCV 2019), Seoul, Korea, pp. 6201–6210.
D. Tran, H. Wang, M. Feiszli, and L. Torresani, (2019) “Video classification with channel-separated convolutional networks,” In: Proc. the 2019 IEEE/CVF 17th International Conference on Computer Vision (ICCV 2019), Seoul, Korea (South, pp. 5551–5560.
S. Sudhakaran, S. Escalera, and O. Lanz, (2020) “Gate-shift networks for video action recognition,” In: Proc. the 2020 IEEE 21th Computer Vision and Pattern Recognition (CVPR 2020), Seattle, USA, pp. 1102–1111.
Xiong X, Min W, Zheng W, Liao P, Yang H, Wang S (2020) S3D-CNN: skeleton-based 3D consecutive-low-pooling neural network for fall detection. Appl Intell 50(1):3521–3534
F. Wang, M. Jiang, Q. Chen, S. Yang, and X. Tang, (2017) “Residual attention network for image classifification,” In: Proc. the 2017 IEEE 18th Computer Vision and Pattern Recognition (CVPR 2017), Hawaii, USA, pp. 6450–6458.
Jie H, Li S, Gang S, Albanie S (2020) Squeeze-and-excitation networks. IEEE T Pattern Anal 42(8):2011–2023
J. Park, S. Woo, J. Y. Lee, and I. S. Kweon, (2018) “BAM: bottleneck attention module.” In: Proc. the 2018 IEEE 29th Conference on British Machine Vision Conference (BMVC 2018), Northumbria, Britain, pp. 1–6.
S. Woo, J. Park, J. Y. Lee and I. S. Kweon, (2018) “CBAM: Convolutional block attention module,” In: Proc. the 2018 IEEE 15th European Conference on Computer Vision (ECCV), Munich, Germany, pp. 3–19.
X. Shi, Z. Chen, H. Wang and D. Y. Yeun, (2015) “Convolutional LSTM network: a machine learning approach for precipitation nowcasting.” In: Proc. the 2015 IEEE 28th Advances in Neural Information Processing Systems (NIPS 2015), Montreal, Quebec, Canada, pp. 802–810.
I. ICharfi, J. Miteran, J. Dubois, M. Atri, and R. Tourki, (2012) “Definition and performance evaluation of a robust svm based fall detection solution,” In: Proc. the 2012 IEEE 8th International Conference on Signal Image Technology and Internet Based Systems (SITIS 2012), Naples, Italy, pp. 218–224.
H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, (2011) “HMDB: a large video database for human motion recognition,” In: Proc.the 2011 IEEE 13th International Conference on Computer Vision (ICCV 2011), Barcelona, Spain, pp. 2556–2563.
Li S, Song X (2023) Future frame prediction network for human fall detection in surveillance videos. IEEE Sens J 23(13):14460–14470
Kwolek B, Kepski M (2014) Human fall detection on embedded platform using depth maps and wireless accelerometer. Comput Meth Prog Bio 117(3):489–501
K. Simonyan, and A. Zisserman, (2015) “Very deep convolutional networks for large-scale image recognition,” in Proc. the 2015 IEEE 3th International Conference on Learning Representation (ICLR 2015). San Diego, CA, pp. 1–6.
S. Sharma, R. Kiros, and R. Salakhutdinov, (2015) “Action recognition using visual attention,” In: Proc. the 2015 IEEE 28th Advances in Neural Information Processing Systems (NIPS 2015), Montreal, Quebec, Canada, pp. 1–12.
C. Szegedy, L. Wei, J. Yangqing, P. Sermanet, S. Reed, and D. Anguelov, (2015) “Going deeper with convolutions,” In: Proc. the 2015 IEEE 15th International Conference on Computer Vision (ICCV 2015), Boston, MA, USA, pp. 1–9.
Wang D, Wu B, Zhou G (2023) Kronecker CP decomposition with fast multiplication for compressing RNNs. IEEE T Neur Net Lear 34(5):2205–2219
Cui M, Wang W, Zhang K, Sun Z, Wang L (2023) Pose-appearance relational modeling for video action recognition. IEEE T Image Process 32(1):295–308
K. Duvvuri, H. Kanisettypalli, K. Jaswanth, and K. Murali, (2023) “Video classification using CNN and ensemble learning,” In: Proc. the 2023 IEEE 9th International Conference on Advanced Computing and Communication Systems (ICACCS 2023), Coimbatore, India, pp. 66–70.
Assefa M, Jiang W, Gedamu K (2023) Actor-aware self-supervised learning for semi-supervised video representation learning. IEEE T Circ Syst Vid 1(1):1–1
S. Das, and M. Ryoo, (2023) “Cross-modal manifold cutmix for self-supervised video representation learning,” In: Proc. the 2023 18th International Conference on Machine Vision and Applications (MVA 2023), Hamamatsu, Japan, pp. 1–6.
Lin W, Ding X, Huang Y, Zeng H (2023) Self-supervised video-based action recognition with disturbances. IEEE T Image Process 32(1):2493–2507
S. A. Cameiro, G. P. D. Silva, G. V. Leite, R. Moreno, and H. Pedrini, (2019) “Multi-stream deep convolutional network using high-level features applied to fall detection in video sequences,” In: Proc. the 2019 IEEE 26th International Conference on Systems, Signals and Image Processing (IWSSIP 2019), Osijek, Croatia, pp. 293–298.
S. Hwang, M. Ki, S. H. Lee, S. Park, and B. K. Jeon, (2022) “Cut and continuous paste towards real-time deep fall detection,” In: Proc. the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2022), Singapore, Singapore, pp. 1775–1779.
Chen T, Ding Z, Li B (2022) Elderly fall detection based on improved YOLOv5s network. IEEE Access 10(1):91273–91282
X. Wang, R. Song, and X. Zhang, (2022) “Real-time human fall recognition based on deep learning methods and single depth image with privacy requirements,” In: Proc. the 2022 37th Youth Academic Annual Conference of Chinese Association of Automation (YAC 2022), Beijing, China, pp. 1548–1553.
Wu L (2023) Robust fall detection in video surveillance based on weakly supervised learning. Neural Netw 163(1):286–297
Soni P, Choudhary A (2022) Grassmann manifold based framework for automated fall detection from a camera. Image Vis Comput 122(1):104431–104443
R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, (2017) “Gradcam: visual explanations from deep networks via gradient-based localization,” In: Proc. the 2017 IEEE 16th International Conference on Computer Vision (ICCV 2017), Venice, Italy, pp. 618–626.
Acknowledgements
This paper is supported in part by the National Natural Science Foundation of China (61962019) and in part by Natural Science Foundation of Jiangxi Province (20224BAB212016), China Scholarship Council (No. 202106825021), and Natural Science Foundation of Shaanxi Province (2020NY-175).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Su, C., Wei, J., Lin, D. et al. A novel model for fall detection and action recognition combined lightweight 3D-CNN and convolutional LSTM networks. Pattern Anal Applic 27, 3 (2024). https://doi.org/10.1007/s10044-024-01224-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10044-024-01224-9