Skip to main content
Log in

STRAN: Student expression recognition based on spatio-temporal residual attention network in classroom teaching videos

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In order to obtain the state of students’ listening in class objectively and accurately, we can obtain students’ emotions through their expressions in class and cognitive feedback through their behaviors in class, and then integrate the two to obtain a comprehensive assessment results of classroom status. However, when obtaining students’ classroom expressions, the major problem is how to accurately and efficiently extract the expression features from the time dimension and space dimension of the class videos. In order to solve the above problems, we propose a class expression recognition model based on spatio-temporal residual attention network (STRAN), which could extract facial expression features through convolution operation in both time and space dimensions on the basis of limited resources, shortest time consumption and optimal performance. Specifically, STRAN firstly uses the residual network with the three-dimensional convolution to solve the problem of network degradation when the depth of the convolutional neural network increases, and the convergence speed of the whole network is accelerated at the same number of layers. Secondly, the spatio-temporal attention mechanism is introduced so that the network can effectively focus on the important video frames and the key areas within the frames. In order to enhance the comprehensiveness and correctness of the final classroom evaluation results, we use deep convolutional neural network to capture students’ behaviors while obtaining their classroom expressions. Then, an intelligent classroom state assessment method(Weight_classAssess) combining students’ expressions and behaviors is proposed to evaluate the classroom state. Finally, on the basis of the public datasets CK+ and FER2013, we construct two more comprehensive synthetic datasets CK+_Class and FER2013_Class, which are more suitable for the scene of classroom teaching, by adding some collected video sequences of students in class and images of students’ expressions in class. The proposed method is compared with the existing methods, and the results show that STRAN can achieve 93.84% and 80.45% facial expression recognition rates on CK+ and CK+_Class datasets, respectively. The accuracy rate of classroom intelligence assessment of students based on Weight_classAssess also reaches 78.19%, which proves the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Huang XH, Wang SJ, Liu X, Zhao GY, Feng XY, Pietikinen M (2019) Discriminative spatiotemporal local binary pattern with revisited integral projection for spontaneous facial micro-expression recognition. IEEE Transactions on Affective Computing 10(1):32–47

    Article  Google Scholar 

  2. Sudhakar K, Manisha V, Raman S (2019) LBVCNN: Local Binary Volume Convolutional Neural Network for Facial Expression Recognition From Image Sequences. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019) Workshops

  3. Perveen N, Roy D, Chalavadi KM (2020) Facial Expression Recognition in Videos Using Dynamic Kernels. IEEE Trans Image Process 29:8316–8325

    Article  MATH  Google Scholar 

  4. Sujata, Mitra, S.K (2021) Modular FER: A Modular Facial Expression Recognition from Image Sequence Based on Two Dimensional (2D) Taylor Expansion.SN COMPUT. SCI.2,181

  5. Chen J, Chen Z, Chi Z, Fu H (2018) Facial Expression Recognition in Video with Multiple Feature Fusion. IEEE Trans Affect Comput 9(1):38–50

    Article  Google Scholar 

  6. Hasani B, Mahoor MH (2017) Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Networks. IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, pp. 2278–2288

  7. Lee J, Kim S, Kim S, Sohn K (2020) Multi-Modal Recurrent Attention Networks for Facial Expression Recognition. IEEE Transactions on Image Processing 29:6977–6991

    Article  MATH  Google Scholar 

  8. Chen Boyu, Guan Wenlong, Li Peixia (2021) Residual multi-task learning for facial landmark localization and expression recognition. Pattern Recognit. 115:107893

    Article  Google Scholar 

  9. Ji SW et al (2013) 3D Convolutional Neural Networks for Human Action Recognition. IEEE Trans Pattern Anal Mach Intell 35:221–231

    Article  Google Scholar 

  10. Tadesse GA, Bent O (2020) Privacy-Aware Human Activity Recognition From a Wearable Camera: Highlights From the IEEE Video And Image Processing Cup 2019 Student Competition. IEEE Signal Process. Mag 37:168–172

    Article  Google Scholar 

  11. Kensho H, Hirokatsu K, Yutaka S (2018) Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), pp. 6546–6555

  12. He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), pp. 770–778

  13. Wang F et al. (2017) Residual Attention Network for Image Classification. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), pp. 6450–6458

  14. Li Y, Zeng J, Shan S, Chen X (2019) Occlusion Aware Facial Expression Recognition Using CNN With Attention Mechanism. IEEE Transactions on Image Processing 28(5):2439–2450

    Article  MathSciNet  Google Scholar 

  15. Pedro D, Marrero F, Fidel A, Guerrero Pena, Tsang IR, Alexandre C (2019) FERAtt: Facial Expression Recognition With Attention Net. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019) Workshops

  16. Chu Q, Ouyang W, Li H, Wang X, Liu B, Yu N (2017) Online Multi-object Tracking Using CNN-Based Single Object Tracker with Spatial-Temporal Attention Mechanism. IEEE International Conference on Computer Vision (ICCV 2017), pp. 4846–4855, https://doi.org/10.1109/ICCV.2017.518

  17. Li J, Liu X, Zhang W, Zhang M, Song J, Sebe N (2020) Spatio-Temporal Attention Networks for Action Recognition and Detection. IEEE Transactions on Multimedia 22(11):2990-3001

    Article  Google Scholar 

  18. Tseng C, Chen Y (2018) A camera-based attention level assessment tool designed for classroom usage. J Supercomput 74:5889-5902

    Article  Google Scholar 

  19. Jingting Li, Catherine S, Renaud S, Wang SJ, Moi HY (2019) Spotting Micro-Expressions on Long Videos Sequences. 14th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2019)

  20. Santos PB, Wahle CV, Gurevych I (2018) Using Facial Expressions of Students for Detecting Levels of Intrinsic Motivation. IEEE 14th International Conference on e-Science (e-Science), Amsterdam, pp. 323–324

  21. Kim PW (2019) Ambient intelligence in a smart classroom for assessing student’ engagement levels. J Ambient Intell Human Comput 10:3847–3852

    Article  Google Scholar 

  22. Riaz UK, Zhang XS, Rajesh K, Emelia OA (2018) Evaluating the Performance of ResNet Model Based on Image Recognition. Proceedings of the International Conference on Computing and Artificial Intelligence(ICCAI 2018) pp. 86–90

  23. He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016) pp. 770–778

  24. Ma JX, Tang H, Zheng WL, Lu BL (2019) Emotion Recognition using Multimodal Residual LSTM Network. MM ’19: Proceedings of the 27th ACM International Conference on Multimedia. pp. 176–183

  25. Li F, Zurada JM, Liu Y, Wu W (2017) Input Layer Regularization of Multilayer Feedforward Neural Networks. IEEE Access 5:10979–10985

    Article  Google Scholar 

  26. Joseph R, Ali F (2018) YOLOv3: An Incremental Improvement. Comput. Vis. Pattern Recognit

  27. Thorpe K, Rankin P, Beatton T et al (2020) The when and what of measuring ECE quality: Analysis of variation in the Classroom Assessment Scoring System (CLASS) across the ECE day. Early Child. Res. Q. 53:274–286

    Article  Google Scholar 

  28. Danniels E, Pyle A, DeLuca C (2020) The role of technology in supporting classroom assessment in play-based kindergarten. Teaching and Teacher Education, pp. 88–96

  29. Meng D, Peng X, Wang K, Qiao Y (2019) Frame Attention Networks for Facial Expression Recognition in Videos. IEEE International Conference on Image Processing (ICIP 2019) pp. 3866–3870

  30. Peng YX, Zhao YZ, Zhang JC (2019) Two-stream collaborative learning with spatial-temporal attention for video classification. IEEE Trans Circuits Syst Video Technol 29(3):773–786

    Article  Google Scholar 

  31. Khor HQ, See J, Phan RC-W and Lin WY (2018) Enriched long-term recurrent convolutional network for facial micro-expression recognition. Proceedings of 13th IEEE International Conference on Automatic Face and Gesture Recognition, Xi’an: IEEE: pp. 667–674

  32. Yin F, Lu XJ, Li D, Liu YL (2016) Video-based emotion recognition using cnn-rnn and c3d hybrid networks. In: ACM ICMI, pp. 445–450

  33. Liu M, Li S, Shan S, Wang R, Chen X (2014) Deeply learning deformable facial action parts model for dynamic expression analysis. In: Asian Conference on Computer Vision, pp. 143–157

  34. Zhao G, Pietikainen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE transactions on pattern analysis and machine intelligence 29(6):915–928

    Article  Google Scholar 

  35. Goodfellow IJ, Erhan D, Carrier PL (2015) Challenges in representation learning. A report on three machine learning contests. Neural Netw 64:59–63

    Article  Google Scholar 

  36. Pramerdorfer C, Kampel M (2016) Facial Expression Recognition using Convolutional Neural Networks: State of the Art. arXiv:1612.02903

  37. Shi JW, Zhu SH (2021) Learning to Amend Facial Expression Representation via De-albino and Affinity. arXiv:2103.10189

  38. Minaee S, Abdolrashidi A (2021) Deep-Emotion: Facial Expression Recognition Using Attentional Convolutional Network. Computer Vision and Pattern Recognition. Sensors 21(9):3046

    Google Scholar 

  39. S. Minaee and A. Abdolrashidi (2019) Deep-emotion: Facial expression recognition using attentional convolutional network, CoRR, vols. abs/1902.01019, pp. 1-xx

  40. Liu X, Kumar BVKV, Jia P, You J (2019) Hard negative generation for identity-disentangled facial expression recognition. Pattern Recognit. 88:1–12

    Article  Google Scholar 

  41. Zhao Z, Liu Q, Wang S (2021) Learning Deep Global Multi-Scale and Local Attention Features for Facial Expression Recognition in the Wild. IEEE Trans Image Process 30:6544–6556. https://doi.org/10.1109/TIP.2021.3093397

    Article  Google Scholar 

  42. K. Sikka, A. Dhall, and M. Bartlett (2015) Exemplar hidden Markov models for classification of facial expressions in videos. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), pp. 18-25

  43. C.-M. Kuo, S.-H. Lai, and M. Sarkis (2018) A compact deep learning model for robust facial expression recognition. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), pp. 2121-2129

  44. Li Y, Zeng J, Shan S, Chen X (2019) Occlusion aware facial expression recognition using CNN with attention mechanism. IEEE Trans. Image Process. 28(5):2439–2450

    Article  MathSciNet  Google Scholar 

  45. C. Szegedy et al. (2015) Going deeper with convolutions, In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),pp. 1-9

  46. I. Cugu, E. Sener, and E. Akbas (2019) MicroExpNet: An extremely small and fast model for expression recognition from face images. In Proc. 9th Int. Conf. Image Process. Theory, Tools Appl. (IPTA), pp. 1-6

  47. Krizhevsky Alex, Sutskever Ilya (2017) ImageNet classification with deep convolutional neural networks. Comm ACM. 60:84–90

  48. X. Cheng, Z. Miao, and Q. Qiu (2020) Graph convolution with low-rank learnable local filter arXiv:2008.01818

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 61877006, No. 62272058), and CAAI-Huawei MindSpore Open Fund (No.CAAIXSJLJJ-2021-007B).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Meiyu Liang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Z., Liang, M., Xue, Z. et al. STRAN: Student expression recognition based on spatio-temporal residual attention network in classroom teaching videos. Appl Intell 53, 25310–25329 (2023). https://doi.org/10.1007/s10489-023-04858-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04858-0

Keywords

Navigation