STRAN: Student expression recognition based on spatio-temporal residual attention network in classroom teaching videos

Chen, Zheng; Liang, Meiyu; Xue, Zhe; Yu, Wanying

doi:10.1007/s10489-023-04858-0

STRAN: Student expression recognition based on spatio-temporal residual attention network in classroom teaching videos

Published: 07 August 2023

Volume 53, pages 25310–25329, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Zheng Chen¹,
Meiyu Liang ORCID: orcid.org/0000-0003-1835-1848¹,
Zhe Xue¹ &
…
Wanying Yu¹

274 Accesses
3 Citations
Explore all metrics

Abstract

In order to obtain the state of students’ listening in class objectively and accurately, we can obtain students’ emotions through their expressions in class and cognitive feedback through their behaviors in class, and then integrate the two to obtain a comprehensive assessment results of classroom status. However, when obtaining students’ classroom expressions, the major problem is how to accurately and efficiently extract the expression features from the time dimension and space dimension of the class videos. In order to solve the above problems, we propose a class expression recognition model based on spatio-temporal residual attention network (STRAN), which could extract facial expression features through convolution operation in both time and space dimensions on the basis of limited resources, shortest time consumption and optimal performance. Specifically, STRAN firstly uses the residual network with the three-dimensional convolution to solve the problem of network degradation when the depth of the convolutional neural network increases, and the convergence speed of the whole network is accelerated at the same number of layers. Secondly, the spatio-temporal attention mechanism is introduced so that the network can effectively focus on the important video frames and the key areas within the frames. In order to enhance the comprehensiveness and correctness of the final classroom evaluation results, we use deep convolutional neural network to capture students’ behaviors while obtaining their classroom expressions. Then, an intelligent classroom state assessment method(Weight_classAssess) combining students’ expressions and behaviors is proposed to evaluate the classroom state. Finally, on the basis of the public datasets CK+ and FER2013, we construct two more comprehensive synthetic datasets CK+_Class and FER2013_Class, which are more suitable for the scene of classroom teaching, by adding some collected video sequences of students in class and images of students’ expressions in class. The proposed method is compared with the existing methods, and the results show that STRAN can achieve 93.84% and 80.45% facial expression recognition rates on CK+ and CK+_Class datasets, respectively. The accuracy rate of classroom intelligence assessment of students based on Weight_classAssess also reaches 78.19%, which proves the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel multi-scale facial expression recognition algorithm based on improved Res2Net for classroom scenes

Article 15 July 2023

Towards emotion-sensitive learning cognitive state analysis of big data in education: deep learning-based facial expression analysis using ordinal information

Article 05 May 2019

Design of a Fast Recognition Method for College Students’ Classroom Expression Images Based on Deep Learning

References

Huang XH, Wang SJ, Liu X, Zhao GY, Feng XY, Pietikinen M (2019) Discriminative spatiotemporal local binary pattern with revisited integral projection for spontaneous facial micro-expression recognition. IEEE Transactions on Affective Computing 10(1):32–47
Article Google Scholar
Sudhakar K, Manisha V, Raman S (2019) LBVCNN: Local Binary Volume Convolutional Neural Network for Facial Expression Recognition From Image Sequences. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019) Workshops
Perveen N, Roy D, Chalavadi KM (2020) Facial Expression Recognition in Videos Using Dynamic Kernels. IEEE Trans Image Process 29:8316–8325
Article MATH Google Scholar
Sujata, Mitra, S.K (2021) Modular FER: A Modular Facial Expression Recognition from Image Sequence Based on Two Dimensional (2D) Taylor Expansion.SN COMPUT. SCI.2,181
Chen J, Chen Z, Chi Z, Fu H (2018) Facial Expression Recognition in Video with Multiple Feature Fusion. IEEE Trans Affect Comput 9(1):38–50
Article Google Scholar
Hasani B, Mahoor MH (2017) Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Networks. IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, pp. 2278–2288
Lee J, Kim S, Kim S, Sohn K (2020) Multi-Modal Recurrent Attention Networks for Facial Expression Recognition. IEEE Transactions on Image Processing 29:6977–6991
Article MATH Google Scholar
Chen Boyu, Guan Wenlong, Li Peixia (2021) Residual multi-task learning for facial landmark localization and expression recognition. Pattern Recognit. 115:107893
Article Google Scholar
Ji SW et al (2013) 3D Convolutional Neural Networks for Human Action Recognition. IEEE Trans Pattern Anal Mach Intell 35:221–231
Article Google Scholar
Tadesse GA, Bent O (2020) Privacy-Aware Human Activity Recognition From a Wearable Camera: Highlights From the IEEE Video And Image Processing Cup 2019 Student Competition. IEEE Signal Process. Mag 37:168–172
Article Google Scholar
Kensho H, Hirokatsu K, Yutaka S (2018) Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), pp. 6546–6555
He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), pp. 770–778
Wang F et al. (2017) Residual Attention Network for Image Classification. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), pp. 6450–6458
Li Y, Zeng J, Shan S, Chen X (2019) Occlusion Aware Facial Expression Recognition Using CNN With Attention Mechanism. IEEE Transactions on Image Processing 28(5):2439–2450
Article MathSciNet Google Scholar
Pedro D, Marrero F, Fidel A, Guerrero Pena, Tsang IR, Alexandre C (2019) FERAtt: Facial Expression Recognition With Attention Net. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019) Workshops
Chu Q, Ouyang W, Li H, Wang X, Liu B, Yu N (2017) Online Multi-object Tracking Using CNN-Based Single Object Tracker with Spatial-Temporal Attention Mechanism. IEEE International Conference on Computer Vision (ICCV 2017), pp. 4846–4855, https://doi.org/10.1109/ICCV.2017.518
Li J, Liu X, Zhang W, Zhang M, Song J, Sebe N (2020) Spatio-Temporal Attention Networks for Action Recognition and Detection. IEEE Transactions on Multimedia 22(11):2990-3001
Article Google Scholar
Tseng C, Chen Y (2018) A camera-based attention level assessment tool designed for classroom usage. J Supercomput 74:5889-5902
Article Google Scholar
Jingting Li, Catherine S, Renaud S, Wang SJ, Moi HY (2019) Spotting Micro-Expressions on Long Videos Sequences. 14th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2019)
Santos PB, Wahle CV, Gurevych I (2018) Using Facial Expressions of Students for Detecting Levels of Intrinsic Motivation. IEEE 14th International Conference on e-Science (e-Science), Amsterdam, pp. 323–324
Kim PW (2019) Ambient intelligence in a smart classroom for assessing student’ engagement levels. J Ambient Intell Human Comput 10:3847–3852
Article Google Scholar
Riaz UK, Zhang XS, Rajesh K, Emelia OA (2018) Evaluating the Performance of ResNet Model Based on Image Recognition. Proceedings of the International Conference on Computing and Artificial Intelligence(ICCAI 2018) pp. 86–90
He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016) pp. 770–778
Ma JX, Tang H, Zheng WL, Lu BL (2019) Emotion Recognition using Multimodal Residual LSTM Network. MM ’19: Proceedings of the 27th ACM International Conference on Multimedia. pp. 176–183
Li F, Zurada JM, Liu Y, Wu W (2017) Input Layer Regularization of Multilayer Feedforward Neural Networks. IEEE Access 5:10979–10985
Article Google Scholar
Joseph R, Ali F (2018) YOLOv3: An Incremental Improvement. Comput. Vis. Pattern Recognit
Thorpe K, Rankin P, Beatton T et al (2020) The when and what of measuring ECE quality: Analysis of variation in the Classroom Assessment Scoring System (CLASS) across the ECE day. Early Child. Res. Q. 53:274–286
Article Google Scholar
Danniels E, Pyle A, DeLuca C (2020) The role of technology in supporting classroom assessment in play-based kindergarten. Teaching and Teacher Education, pp. 88–96
Meng D, Peng X, Wang K, Qiao Y (2019) Frame Attention Networks for Facial Expression Recognition in Videos. IEEE International Conference on Image Processing (ICIP 2019) pp. 3866–3870
Peng YX, Zhao YZ, Zhang JC (2019) Two-stream collaborative learning with spatial-temporal attention for video classification. IEEE Trans Circuits Syst Video Technol 29(3):773–786
Article Google Scholar
Khor HQ, See J, Phan RC-W and Lin WY (2018) Enriched long-term recurrent convolutional network for facial micro-expression recognition. Proceedings of 13th IEEE International Conference on Automatic Face and Gesture Recognition, Xi’an: IEEE: pp. 667–674
Yin F, Lu XJ, Li D, Liu YL (2016) Video-based emotion recognition using cnn-rnn and c3d hybrid networks. In: ACM ICMI, pp. 445–450
Liu M, Li S, Shan S, Wang R, Chen X (2014) Deeply learning deformable facial action parts model for dynamic expression analysis. In: Asian Conference on Computer Vision, pp. 143–157
Zhao G, Pietikainen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE transactions on pattern analysis and machine intelligence 29(6):915–928
Article Google Scholar
Goodfellow IJ, Erhan D, Carrier PL (2015) Challenges in representation learning. A report on three machine learning contests. Neural Netw 64:59–63
Article Google Scholar
Pramerdorfer C, Kampel M (2016) Facial Expression Recognition using Convolutional Neural Networks: State of the Art. arXiv:1612.02903
Shi JW, Zhu SH (2021) Learning to Amend Facial Expression Representation via De-albino and Affinity. arXiv:2103.10189
Minaee S, Abdolrashidi A (2021) Deep-Emotion: Facial Expression Recognition Using Attentional Convolutional Network. Computer Vision and Pattern Recognition. Sensors 21(9):3046
Google Scholar
S. Minaee and A. Abdolrashidi (2019) Deep-emotion: Facial expression recognition using attentional convolutional network, CoRR, vols. abs/1902.01019, pp. 1-xx
Liu X, Kumar BVKV, Jia P, You J (2019) Hard negative generation for identity-disentangled facial expression recognition. Pattern Recognit. 88:1–12
Article Google Scholar
Zhao Z, Liu Q, Wang S (2021) Learning Deep Global Multi-Scale and Local Attention Features for Facial Expression Recognition in the Wild. IEEE Trans Image Process 30:6544–6556. https://doi.org/10.1109/TIP.2021.3093397
Article Google Scholar
K. Sikka, A. Dhall, and M. Bartlett (2015) Exemplar hidden Markov models for classification of facial expressions in videos. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), pp. 18-25
C.-M. Kuo, S.-H. Lai, and M. Sarkis (2018) A compact deep learning model for robust facial expression recognition. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), pp. 2121-2129
Li Y, Zeng J, Shan S, Chen X (2019) Occlusion aware facial expression recognition using CNN with attention mechanism. IEEE Trans. Image Process. 28(5):2439–2450
Article MathSciNet Google Scholar
C. Szegedy et al. (2015) Going deeper with convolutions, In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),pp. 1-9
I. Cugu, E. Sener, and E. Akbas (2019) MicroExpNet: An extremely small and fast model for expression recognition from face images. In Proc. 9th Int. Conf. Image Process. Theory, Tools Appl. (IPTA), pp. 1-6
Krizhevsky Alex, Sutskever Ilya (2017) ImageNet classification with deep convolutional neural networks. Comm ACM. 60:84–90
X. Cheng, Z. Miao, and Q. Qiu (2020) Graph convolution with low-rank learnable local filter arXiv:2008.01818

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 61877006, No. 62272058), and CAAI-Huawei MindSpore Open Fund (No.CAAIXSJLJJ-2021-007B).

Author information

Authors and Affiliations

Beijing Key Laboratory of Intelligent Communication Software and Multimedia, School of Computer Science (National Pilot Software Engineering School), Beijing University of Posts and Telecommunications, BeiJing, 100876, China
Zheng Chen, Meiyu Liang, Zhe Xue & Wanying Yu

Authors

Zheng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Meiyu Liang
View author publications
You can also search for this author in PubMed Google Scholar
Zhe Xue
View author publications
You can also search for this author in PubMed Google Scholar
Wanying Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Meiyu Liang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, Z., Liang, M., Xue, Z. et al. STRAN: Student expression recognition based on spatio-temporal residual attention network in classroom teaching videos. Appl Intell 53, 25310–25329 (2023). https://doi.org/10.1007/s10489-023-04858-0

Download citation

Accepted: 16 June 2023
Published: 07 August 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s10489-023-04858-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

STRAN: Student expression recognition based on spatio-temporal residual attention network in classroom teaching videos

Abstract

Access this article

Similar content being viewed by others

A novel multi-scale facial expression recognition algorithm based on improved Res2Net for classroom scenes

Towards emotion-sensitive learning cognitive state analysis of big data in education: deep learning-based facial expression analysis using ordinal information

Design of a Fast Recognition Method for College Students’ Classroom Expression Images Based on Deep Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

STRAN: Student expression recognition based on spatio-temporal residual attention network in classroom teaching videos

Abstract

Access this article

Similar content being viewed by others

A novel multi-scale facial expression recognition algorithm based on improved Res2Net for classroom scenes

Towards emotion-sensitive learning cognitive state analysis of big data in education: deep learning-based facial expression analysis using ordinal information

Design of a Fast Recognition Method for College Students’ Classroom Expression Images Based on Deep Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation