Skip to main content
Log in

A spatiotemporal network using a local spatial difference stack block for facial micro-expression recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Recently, video-based micro-expression recognition (MER) applications have attracted attention in various scenarios. However, current deep learning-based MER methods frequently struggle with several challenges, such as insufficient data, difficulty in capturing subtle facial motions, and keyframe recognition. In this paper, we propose a robust MER solution without prior annotation of keyframes. To prevent traditional data augmentation techniques from destroying the slight motion information in the sequence frames, stride sampling is designed to increase the number of samples while preserving the important motion features of the micro-expression (ME). Moreover, to capture facial rapid and subtle changes to enhance the accuracy of ME classification, we construct a local spatial difference stack (LSDS) block and incorporate it into the lightweight spatiotemporal network VGGFace-TCN. Experiments demonstrate that our proposed algorithm can effectively detect the local facial movement details of MEs from original frames without additional visual features, e.g., optical flow, and minimize the risk of overfitting. Compared with other state-of-the-art methods, the proposed method obtained the best performance under the holdout database evaluation (HDE) strategy with an accuracy and F1-score of 57.46% and 0.3734, respectively. Furthermore, it attained an accuracy of 61.27% and an F1-score of 0.5343 on the Spontaneous Actions and Micro-movements (SAMM) dataset, which is significantly higher than other state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

The data that support the findings of this study are available on request from the corresponding author.

References

  1. Ahmad S, Mehfuz S, Mebarek-Oudina F, Beg J (2022) RSM analysis based cloud access security broker: a systematic literature review. Clust Comput 25(5):3733–3763. https://doi.org/10.1007/s10586-022-03598-z

    Article  Google Scholar 

  2. Bai S, Kolter JZ, Vladlen K (2018) An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271. https://doi.org/10.48550/arXiv.1803.01271

  3. Chang S, Wang P, Wang F, Li H, Feng J (2021) Augmented transformer with adaptive graph for temporal action proposal generation. arXiv preprint arXiv:2103.16024. https://doi.org/10.48550/arXiv.2103.16024

  4. Davison AK, Merghani W, Yap MH (2018) Objective classes for micro-facial expression recognition. Journal of Imaging 4(10):119. https://doi.org/10.3390/jimaging4100119

    Article  Google Scholar 

  5. Ding C, Liu K, Korhonen J, Belyaev E (2021) Spatio-temporal difference descriptor for skeleton-based action recognition. The AAAI Conference on Artificial Intelligence 35:1227–1235

    Article  Google Scholar 

  6. Ekman P, Friesen WV (1969) Nonverbal leakage and clues to deception. Psychiatry 32(1):88–106. https://doi.org/10.1080/00332747.1969.11023575

    Article  Google Scholar 

  7. Han J et al (2022) You only cut once: Boosting data augmentation with a single cut. In: International Conference on Machine Learning, pp 8196–8212

  8. Happy SL, Routray A (2019) Fuzzy histogram of optical flow orientations for micro-expression recognition. IEEE Trans Affect Comput 10(3):394–406. https://doi.org/10.1109/TAFFC.2017.2723386

    Article  Google Scholar 

  9. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  10. Howard AG, Zhu M, Chen B (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861. https://doi.org/10.48550/arXiv.1704.04861

  11. Huang X, Wang S-J, Zhao G, Piteikainen M (2015) Facial micro-expression recognition using spatiotemporal local binary pattern with integral projection. In: The IEEE International Conference on Computer Vision (ICCV) Workshops, pp 1–9

  12. Kanmani M, Narasimhan V (2019) An optimal weighted averaging fusion strategy for remotely sensed images. Multimedia Tools and Applications 30(4):1911–1935. https://doi.org/10.1007/s11045-019-00636-9

    Article  Google Scholar 

  13. Kanmani M, Narasimhan V (2020) Optimal fusion aided face recognition from visible and thermal face images. Multimedia Tools and Applications 79(25):17859–17883. https://doi.org/10.1007/s11042-020-08628-9

    Article  Google Scholar 

  14. Kanmani M, Narasimhan V (2020) Optimal fusion aided face recognition from visible and thermal face images. Multidimension Syst Signal Process 79(25):17859–17883. https://doi.org/10.1007/s11042-020-08628-9

    Article  Google Scholar 

  15. Khor H-Q, See J, Liong S-T, Phan RC W, Lin W (2019) Dual-stream shallow networks for facial micro-expression recognition. In: The 2019 IEEE International Conference on Image Processing, pp 36–40

  16. Khor H, See J, Phan RC W, Lin W (2018) Enriched long-term recurrent convolutional network for facial micro-expression recognition. In: The 13th IEEE International Conference on Automatic Face and Gesture Recognition, pp 667–674

  17. Kwon S (2020) Clstm: Deep feature-based speech emotion recognition using the hierarchical convlstm network. Mathematics 8(12):111. https://doi.org/10.3390/math8122133

    Article  Google Scholar 

  18. Kwon S (2021) Mlt-dnet: Speech emotion recognition using 1d dilated CNN based on multi-learning trick approach. Expert Syst Appl 167:114177. https://doi.org/10.1016/j.eswa.2020.114177

    Article  Google Scholar 

  19. Kwon S (2021) Att-net: Enhanced emotion recognition system using lightweight self-attention module. Appl Soft Comput 102:107101. https://doi.org/10.1016/j.asoc.2021.107101

    Article  Google Scholar 

  20. Le Ngo AC, Johnston A, Phan RC-W, See J (2018) Micro-expression motion magnification: Global Lagrangian vs. local Eulerian approaches. In: The 13th IEEE International Conference on Automatic Face Gesture Recognition, pp 650–656

  21. Lei L, Li J, Chen T (2020) A novel graph-TCN with a graph structured representation for micro-expression recognition. In: Association for Computing Machinery, pp 2237–2245

  22. Li Q, Zhan S, Xu L, Wu C (2019) Facial micro-expression recognition based on the fusion of deep learning and enhanced optical flow. Multimedia Tools and Applications 78(20):29307–29322. https://doi.org/10.1007/s11042-018-6857-9

    Article  Google Scholar 

  23. Li R, Wang L, Suganthan P, Sourina O (2022) Sample-based data augmentation based on electroencephalogram intrinsic characteristics. IEEE J Biomed Health Inform 26(10):4996–5003. https://doi.org/10.1109/JBHI.2022.3185587

    Article  Google Scholar 

  24. Liong et al (2016) Less is more: Micro-expression recognition from video using apex frame. Signal Processing: Image Communication 62:82–92. https://doi.org/10.1016/j.image.2017.11.006

    Article  Google Scholar 

  25. Liong S-T, See J, Phan RC-W, Wong K, Tan S-W (2018) Hybrid facial regions extraction for micro-expression recognition system. Journal of Signal Processing Systems 90(4):601–617. https://doi.org/10.1007/s11265-017-1276-0

    Article  Google Scholar 

  26. Nie X, Takalkar MA, Duan M, Zhang H, Xu M (2021) Geme: Dual-stream multi-task gender-based micro-expression recognition. Neurocomputing 427:13–28. https://doi.org/10.1016/j.neucom.2020.10.082

    Article  Google Scholar 

  27. Oh T-H et al (2018) Learning-based video motion magnification. In: The European Conference on Computer Vision, pp 633–648

  28. Peng M et al (2019) A novel apex-time network for cross-dataset micro-expression recognition. In: The 8th International Conference on Affective Computing and Intelligent Interaction, pp 1–6

  29. Peng W, Hong X, Xu Y (2019) A boost in revealing subtle facial expressions: A consolidated Eulerian framework. In: The 14th IEEE International Conference on Automatic Face and Gesture Recognition, pp 1–5

  30. Peng M, Wu Z, Zhang Z, Chen T (2018) From macro to micro expression recognition: Deep learning on small datasets using transfer learning. In: The 13th IEEE International Conference on Automatic Face Gesture Recognition, pp 657–661

  31. Reddy SPT, Karri ST, Dubey SR, Mukherjee S (2019) Spontaneous facial micro-expression recognition using 3D spatiotemporal convolutional neural networks. In: The 2019 International Joint Conference on Neural Networks, pp 1–8

  32. Sajjad M, Kwon S (2020) Clustering-based speech emotion recognition by incorporating learned features and deep bilstm. IEEE Access 8:79861–79875. https://doi.org/10.1109/ACCESS.2020.2990405

    Article  Google Scholar 

  33. Sajjad M, Kwon S (2022) Otsu’s thresholding technique for MRI image brain tumor segmentation. Multimedia Tools and Applications 81(30):43837–43849. https://doi.org/10.1007/s11042-022-13215-1

    Article  Google Scholar 

  34. Shreve M (2013) Automatic macro-and micro-facial expression spotting and applications. University of South Florida

  35. Sun B, Cao S, Li D, He J, Yu L (2020) Dynamic micro-expression recognition using knowledge distillation. IEEE Trans Affect Comput 99:1–1. https://doi.org/10.1109/TAFFC.2020.2986962

    Article  Google Scholar 

  36. Wadhwa N, Rubinstein M, Durand F, Freeman WT (2013) Phase-based video motion processing. ACM Transactions on Graphics (TOG) 32(4):1–10. https://doi.org/10.1145/2461912.2461966

    Article  Google Scholar 

  37. Wang F, Cheng J, Liu W, Liu H (2018) Additive margin softmax for face verification. IEEE Signal Process Lett 25(7):926–930. https://doi.org/10.1109/LSP.2018.2822810

    Article  Google Scholar 

  38. Wang L, Xiao H, Luo S, Zhang J, Liu X (2019) A weighted feature extraction method based on temporal accumulation of optical flow for micro-expression recognition. Signal Processing: Image Communication 78:246–253. https://doi.org/10.1016/j.image.2019.07.011

    Article  Google Scholar 

  39. Wang C, Peng M, Bi T, Chen T (2020) Micro-attention for micro-expression recognition. Neurocomputing 410:354–362. https://doi.org/10.1016/j.neucom.2020.06.005

    Article  Google Scholar 

  40. Wei M et al (2022) A novel micro-expression recognition approach using attention-based magnification-adaptive networks. In: 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing, pp 2420–2424

  41. Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. In: The European Conference on Computer Vision, pp 499–515

  42. Wu H-Y, Rubinstein M, Shih E, Guttag J (2012) Eulerian video magnification for revealing subtle changes in the world. ACM Transactions on Graphics (TOG) 31(4):1–8. https://doi.org/10.1145/2185520.2185561

    Article  Google Scholar 

  43. Xie S, Hu H, Wu Y (2019) Deep multi-path convolutional neural network joint with salient region attention for facial expression recognition. Pattern Recogn 92:177–191. https://doi.org/10.1016/j.patcog.2019.03.019

    Article  Google Scholar 

  44. Xie H, Lo L, Shuai H, Cheng W (2020) Au-assisted graph attention convolutional network for micro-expression recognition. In: The 28th ACM International Conference on Multimedia, vol. 10, pp 2871–2880

  45. Xu F, Zhang J, Wang JZ (2017) Microexpression identification and categorization using a facial dynamics map. IEEE Trans Affect Comput 8(2):254–267. https://doi.org/10.1109/TAFFC.2016.2518162

    Article  Google Scholar 

  46. Yan W et al (2014) Casme ii: An improved spontaneous micro-expression database and the baseline evaluation. PLoS One 9(1):e86041. https://doi.org/10.1371/journal.pone.0086041

    Article  Google Scholar 

  47. Yap MH, See J, Hong X, Wang S-J (2018) Facial micro-expressions grand challenge 2018 summary. In: The 13th IEEE International Conference on Automatic Face and Gesture Recognition, pp 675–678

  48. Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503. https://doi.org/10.1109/LSP.2016.2603342

    Article  Google Scholar 

  49. Zhao S et al (2021) A two-stage 3D CNN based learning method for spontaneous micro-expression recognition. Neurocomputing 448:276–289. https://doi.org/10.1016/j.neucom.2021.03.058

    Article  Google Scholar 

  50. Zhao G, Pietikainen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans Pattern Anal Mach Intell 29(6):915–928. https://doi.org/10.1109/TPAMI.2007.1110

    Article  Google Scholar 

  51. Zhi R, Xu H, Wan M, Li T (2019) Combining 3D convolutional neural networks with transfer learning by supervised pre-training for facial micro-expression recognition. IEICE Trans Inf Syst 102(5):1054–1064. https://doi.org/10.1587/transinf.2018EDP7153

    Article  Google Scholar 

  52. Zhu C et al (2017) Comparison of ecological micro-expression recognition in patients with depression and healthy individuals. Front Behav Neurosci 11:199. https://doi.org/10.3389/fnbeh.2017.00199

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under grant 62076103, the Guangzhou Science and Technology Plan Project Key Field R &D Project under grant 202007030005, and the Guangdong Natural Science Foundation of China under grant 2019A1515011375.

Author information

Authors and Affiliations

Authors

Contributions

Yan Liang: Conceptualization, Methodology, Writing-review and editing. Yan Hao: Verification, Visualization, Writing-original draft. Jiacheng Liao: Verification, Visualization. Zhuoran Deng: Investigation, Data curation. Xing Wen: Investigation, Data curation. Zefeng Zheng: Data curation.Jiahui Pan: Supervision, Funding acquisition.

Corresponding author

Correspondence to Jiahui Pan.

Ethics declarations

Competing interest

All the authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liang, Y., Hao, Y., Liao, J. et al. A spatiotemporal network using a local spatial difference stack block for facial micro-expression recognition. Multimed Tools Appl 83, 11593–11612 (2024). https://doi.org/10.1007/s11042-023-16033-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16033-1

Keywords

Navigation