Skip to main content
Log in

Facial micro-expression recognition using three-stream vision transformer network with sparse sampling and relabeling

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Most existing micro-expression recognition (MER) methods are based on convolutional neural networks (CNN) and could obtain better representations than conventional handcrafted-based methods. Nevertheless, the local receptive field of CNN leads to poor global feature extraction and thus limits the accuracy. In contrast, the vision transformer, an alternative technique, could capture global facial information and perform superiority over CNN in many vision tasks. However, directly applying it to MER may not be as effective as expected since the insufficient data and class-imbalanced characteristics of existing ME datasets could seriously restrict the accuracy. To address these problems, we propose a three-stream vision transformer-based network with sparse sampling and relabeling (SSRLTS-ViT). First, the network learns discriminative ME representations from three optical flow components. Second, a sparse sampling strategy is employed to add the optical flow components computed by the onset and images around the apex into training sets, which can expand the sample capacity and simultaneously guarantee the differences between data. Moreover, we introduce a relabeling mechanism to reassign the training data with correct labels to decrease the impact caused by subjectivity annotations, which can further improve recognition accuracy. Experimental results on two benchmarks show that SSRLTS-ViT outperforms other competing methods by obtaining the UF1 of 0.843 and UAR of 0.853 on the 3-class datasets and the UF1 of 0.795 and UAR of 0.801 on the 5-class datasets, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Data availability

The datasets used in our paper (SMIC-HS, CASME II, and SAMM) are publicly available.

References

  1. Zhao, G., Pietikainen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Patt. Analy. Mach. Intell. 29(6), 915–928 (2007)

    Article  Google Scholar 

  2. Liong, S.-T., See, J., Wong, K., Phan, R.C.-W.: Less is more: Micro-expression recognition from video using apex frame. Signal Process. Image Commun. 62, 82–92 (2018)

    Article  Google Scholar 

  3. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  4. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020). Springer

  5. Ma, F., Sun, B., Li, S.: Facial expression recognition with visual transformers and attentional selective fusion. IEEE Transactions on Affective Computing, 1–1 (2021)

  6. Wang, Y., Xu, Z., Wang, X., Shen, C., Cheng, B., Shen, H., Xia, H.: End-to-end video instance segmentation with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8741–8750 (2021)

  7. Bobojanov, S., Kim, B., Arabboev, M., Begmatov, S.: Comparative analysis of vision transformer models for facial emotion recognition using augmented balanced datasets. Appl. Sci. 13, 12271 (2023)

    Article  Google Scholar 

  8. Zhou, H., Huang, S., Xu, Y.: Inceptr: micro-expression recognition integrating inception-cbam and vision transformer. Multimedia Syst. 29, 1–14 (2023)

    Article  Google Scholar 

  9. Wang, K., Peng, X., Yang, J., Lu, S., Qiao, Y.: Suppressing uncertainties for large-scale facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6897–6906 (2020)

  10. Peng, M., Wu, Z., Zhang, Z., Chen, T.: From macro to micro expression recognition: Deep learning on small datasets using transfer learning. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 657–661 (2018). IEEE

  11. Liu, Y., Du, H., Zheng, L., Gedeon, T.: A neural micro-expression recognizer. In: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), pp. 1–4 (2019). IEEE

  12. Zhao, S., Tao, H., Zhang, Y., Xu, T., Zhang, K., Hao, Z., Chen, E.: A two-stage 3d cnn based learning method for spontaneous micro-expression recognition. Neurocomputing 448, 276–289 (2021)

    Article  Google Scholar 

  13. Khor, H.-Q., See, J., Phan, R.C.W., Lin, W.: Enriched long-term recurrent convolutional network for facial micro-expression recognition. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 667–674 (2018). IEEE

  14. Xia, Z., Hong, X., Gao, X., Feng, X., Zhao, G.: Spatiotemporal recurrent convolutional networks for recognizing spontaneous micro-expressions. IEEE Trans. Multimedia 22(3), 626–640 (2019)

    Article  Google Scholar 

  15. Xia, Z., Peng, W., Khor, H.-Q., Feng, X., Zhao, G.: Revealing the invisible with model and data shrinking for composite-database micro-expression recognition. IEEE Trans. Image Process. 29, 8590–8605 (2020)

    Article  Google Scholar 

  16. Gan, Y.S., Liong, S.-T., Yau, W.-C., Huang, Y.-C., Tan, L.-K.: Off-apexnet on micro-expression recognition system. Signal Process Image Commun 74, 129–139 (2019)

    Article  Google Scholar 

  17. Liong, S.-T., Gan, Y.S., See, J., Khor, H.-Q., Huang, Y.-C.: Shallow triple stream three-dimensional cnn (ststnet) for micro-expression recognition. In: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), pp. 1–5 (2019). IEEE

  18. Chen, B., Liu, K.-H., Xu, Y., Wu, Q.-Q., Yao, J.-F.: Block division convolutional network with implicit deep features augmentation for micro-expression recognition. IEEE Transactions on Multimedia (2022)

  19. Nie, X., Takalkar, M.A., Duan, M., Zhang, H., Xu, M.: Geme: dual-stream multi-task gender-based micro-expression recognition. Neurocomputing 427, 13–28 (2021)

    Article  Google Scholar 

  20. Zhou, H., Huang, S., Li, J., Wang, S.-J.: Dual-atme: dual-branch attention network for micro-expression recognition. Entropy 25, 460 (2023)

    Article  Google Scholar 

  21. Peng, Z., Huang, W., Gu, S., Xie, L., Wang, Y., Jiao, J., Ye, Q.: Conformer: Local features coupling global representations for visual recognition. arXiv preprint arXiv:2105.03889 (2021)

  22. Srinivas, A., Lin, T.-Y., Parmar, N., Shlens, J., Abbeel, P., Vaswani, A.: Bottleneck transformers for visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16519–16529 (2021)

  23. Zhang, L., Hong, X., Arandjelović, O., Zhao, G.: Short and long range relation based spatio-temporal transformer for micro-expression recognition. IEEE Trans. Affect. Comput. 13(4), 1973–1985 (2022)

    Article  Google Scholar 

  24. King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)

    Google Scholar 

  25. Zach, C., Pock, T., Bischof, H.: A duality based approach for realtime tv-l 1 optical flow. In: Joint Pattern Recognition Symposium, pp. 214–223 (2007). Springer

  26. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł, Polosukhin, I.: Attention is all you need. Adv. Neural Inform. Process. Syst. 30, 12 (2017)

    Google Scholar 

  27. Yan, W.-J., Li, X., Wang, S.-J., Zhao, G., Liu, Y.-J., Chen, Y.-H., Fu, X.: Casme ii: an improved spontaneous micro-expression database and the baseline evaluation. PloS one 9(1), 86041 (2014)

    Article  Google Scholar 

  28. Davison, A.K., Lansley, C., Costen, N., Tan, K., Yap, M.H.: Samm: a spontaneous micro-facial movement dataset. IEEE Trans. Affect. Comput. 9(1), 116–129 (2016)

    Article  Google Scholar 

  29. Li, X., Pfister, T., Huang, X., Zhao, G., Pietikäinen, M.: A spontaneous micro-expression database: Inducement, collection and baseline. In: 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (fg), pp. 1–6 (2013). IEEE

  30. See, J., Yap, M.H., Li, J., Hong, X., Wang, S.-J.: Megc 2019-the second facial micro-expressions grand challenge. In: 2019 14th IEEE International Conference on Automatic Face Gesture Recognition (FG 2019), pp. 1–5 (2019)

  31. Davison, A.K., Merghani, W., Yap, M.H.: Objective classes for micro-facial expression recognition. J. Imaging 4(10), 119 (2018)

    Article  Google Scholar 

  32. Van Quang, N., Chun, J., Tokuyama, T.: Capsulenet for micro-expression recognition. In: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), pp. 1–7 (2019). IEEE

  33. Zhou, L., Mao, Q., Xue, L.: Dual-inception network for cross-database micro-expression recognition. In: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), pp. 1–5 (2019). IEEE

  34. Li, H., Sui, M., Zhu, Z., Zhao, F.: Mmnet: Muscle motion-guided network for micro-expression recognition. arXiv preprint arXiv:2201.05297 (2022)

  35. Lei, L., Chen, T., Li, S., Li, J.: Micro-expression recognition based on facial graph representation learning and facial action unit fusion. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1571–1580 (2021)

  36. Esmaeili, V., Mohassel Feghhi, M., Shahdi, S.O.: A comprehensive survey on facial micro-expression: approaches and databases. Multimedia Tools Appl. 81(28), 40089–40134 (2022)

    Article  Google Scholar 

  37. Esmaeili, V., Shahdi, S.O.: Automatic micro-expression apex spotting using cubic-lbp. Multimedia Tools Appl. 79, 20221–20239 (2020)

    Article  Google Scholar 

  38. Esmaeili, V., Mohassel Feghhi, M., Shahdi, S.O.: Spotting micro-movements in image sequence by introducing intelligent cubic-lbp. IET Image Process. 16(14), 3814–3830 (2022)

    Article  Google Scholar 

  39. Chaudhry, R., Ravichandran, A., Hager, G., Vidal, R.: Histograms of oriented optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1932–1939 (2009). IEEE

  40. Polikovsky, S., Kameda, Y., Ohta, Y.: Facial micro-expressions recognition using high speed camera and 3d-gradient descriptor. In: 3rd International Conference on Imaging for Crime Detection and Prevention (ICDP 2009), pp. 1–6 (2009)

  41. Wang, C., Peng, M., Bi, T., Chen, T.: Micro-attention for micro-expression recognition. Neurocomputing 410, 354–362 (2020)

    Article  Google Scholar 

  42. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inform. Process. Syst. 25, 1097–1105 (2012)

    Google Scholar 

  43. Zhao, S., Yin, S., Tang, H., Jin, R., Xu, Y., Xu, T., Chen, E.: Fine-grained micro-expression generation based on thin-plate spline and relative au constraint. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 7150–7154 (2022)

  44. Zhang, Y., Xu, X., Zhao, Y., Wen, Y., Tang, Z., Liu, M.: Facial prior guided micro-expression generation. IEEE Trans. Image Process. 27, 303 (2023)

    Google Scholar 

Download references

Funding

This work was supported in part by the Key R &D Program of Hunan (2022SK2104), Leading plan for scientific and technological innovation of high-tech industries of Hunan (2022GK4010), National Key R &D Program of China (2021YFF0900600), China Scholarship Council (CSC, No. 202306130012 and 202306130013), and National Natural Science Foundation of China (61672222).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hanling Zhang.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, H., Yin, L., Zhang, H. et al. Facial micro-expression recognition using three-stream vision transformer network with sparse sampling and relabeling. SIViP 18, 3761–3771 (2024). https://doi.org/10.1007/s11760-024-03039-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-024-03039-x

Keywords

Navigation