Skip to main content
Log in

A dual-branch neural network for DeepFake video detection by detecting spatial and temporal inconsistencies

  • 1221: Deep Learning for Image/Video Compression and Visual Quality Assessment
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

It has become a research hotspot to detect whether a video is natural or DeepFake. However, almost all the existing works focus on detecting the inconsistency in either spatial or temporal. In this paper, a dual-branch (spatial branch and temporal branch) neural network is proposed to detect the inconsistency in both spatial and temporal for DeepFake video detection. The spatial branch aims at detecting spatial inconsistency by the effective EfficientNet model. The temporal branch focuses on temporal inconsistency detection by a new network model. The new temporal model considers optical flow as input, uses the EfficientNet to extract optical flow features, utilize the Bidirectional Long-Short Term Memory (Bi-LSTM) network to capture the temporal inconsistency of optical flow. Moreover, the optical flow frames are stacked before inputting into the EfficientNet. Finally, the softmax scores of two branches are combined with a binary-class linear SVM classifier. Experimental results on the compressed FaceForensics++ dataset and Celeb-DF dataset show that: (a) the proposed dual-branch network model performs better than some recent spatial and temporal models for the Celeb-DF dataset and all the four manipulation methods in FaceForensics++ dataset since these two branches can complement each other; (b) the use of optical flow inputs, Bi-LSTM and dual-branches can greatly improve the detection performance by the ablation experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Afchar D, Nozick V, Yamagishi J, et al. (2018) Mesonet: a compact facial video forgery detection network. In: Proceedings of the 2018 IEEE international workshop on information forensics and security (WIFS2018), pp 1–7

  2. Agarwal S, Farid H, Gu Y, et al. (2019) Protecting world leaders against Deep Fakes. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 38–45

  3. Amerini I, Galteri L, Caldelli R, et al. (2019) Deepfake video detection through optical flow based CNN. In: Proceedings of the 2019 IEEE/CVF international conference on computer vision workshops, pp 1205–1207

  4. Barron JL, Fleet DJ, Beauchemin SS et al (1992) Performance of optical flow techniques. Int J Comput Vis 12:43–77

    Article  Google Scholar 

  5. Chen P, Liu J, Liang T, et al. (2020) FSSPOTTER: spotting face-swapped video by spatial and temporal clues. In: Proceedings of the 2020 IEEE international conference on multimedia and expo (ICME2020), pp 1–6

  6. Chen B, Tan W, Coatrieux G et al (2020) A serial image copy-move forgery localization scheme with source/target distinguishment. IEEE Trans Multimedia. https://doi.org/10.1109/TMM.2020.3026868

    Article  Google Scholar 

  7. Chen B, Ju X, Xiao B et al (2021) Locally GAN-generated face detection based on an improved Xception. Inf Sci 572:16–28

    Article  Google Scholar 

  8. Ciftci UA, Demir I, Yin L (2020) FakeCatcher: detection of synthetic portrait videos using biological signals. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2020.3009287

    Article  Google Scholar 

  9. DeepFake Detection Challenge (DFDC). https://ai.facebook.com/datasets/dfdc/

  10. Donahue J, Hendricks LA, Guadarrama S, et al. (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the 2015 IEEE conference on computer vision and pattern recognition (CVPR2015), pp 2625–2634

  11. Ganiyusufoglu I, Ngô LM, Savov N, et al. (2020) Spatio-temporal features for generalized detection of deepfake videos. https://arxiv.org/abs/2010.11844

  12. Gibson JJ (1950) The perception of the visual world. Houghton Mifflin, Boston

    Google Scholar 

  13. Google FC (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the 2017 IEEE conference on computer vision and pattern recognition (CVPR2017), pp 1800–1807

  14. Guera D, Delp EJ (2018) Deepfake video detection using recurrent neural networks. In: Proceedings of the 15th IEEE international conference on advanced video and signal based surveillance (AVSS2018), pp 1–6

  15. Huang G, Liu Z, Maaten LVD, et al. (2017) Densely connected convolutional networks. In: Proceedings of the 2017 IEEE conference on computer vision and pattern recognition (CVPR2017), pp 2261–2269

  16. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Neural Inf Process Syst 25:1097–1105

    Google Scholar 

  17. Khalid H, Woo SS (2020) OC-FakeDect: classifying deepfakes using one-class variational autoencoder. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, vol 656–657

  18. Li Y, Lyu S (2018) Exposing deepfake videos by detecting face warping artifacts. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 46–52

  19. Li Y, Chang M, Lyu S (2018) Exposing ai created fake videos by detecting eye blinking. In: The 2018 IEEE international workshop on information forensics and security (WIFS2018), pp. 1–7

  20. Li L, Bao J, Zhang T, et al. (2020) Face x-ray for more general face forgery detection. In: Proceedings of the 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR2020), pp 5001–5010

  21. Li X, Lang Y, Chen Y, et al. (2020) Sharp multiple instance learning for deepfake video detection. In: Proceedings of the 28th ACM international conference on multimedia, vol 1864–1872

  22. Lima OD, Franklin S, Basu S et al (2020) Deepfake detection using spatiotemporal convolutional networks. https://arxiv.org/abs/2006.14749

  23. Matern F, Riess C, Stamminger M (2019) Exploiting visual artifacts to expose deepfakes and face manipulations. In: Proceedings of the 2019 IEEE winter applications of computer vision workshops (WACVW2019), pp 83–92

  24. Nguyen TT, Nguyen CM, Nguyen DT, et al (2019) Deep learning for deepfakes creation and detection. https://arxiv.org/abs/1909.11573.

  25. Rossler A, Cozzolino D, Verdoliva L, et al. (2019) Faceforensics++: learning to detect manipulated facial images. In: Proceedings of the 2017 IEEE/CVF international conference on computer vision, pp 1–11

  26. Sabir E, Cheng J, Jaiswal A, et al (2019) Recurrent convolutional strategies for face manipulation detection in videos. In: Proceedings of the 2018 IEEE/CVF international conference on computer vision workshops, pp 80–87

  27. Sandler M, Howard A, Zhu M, et al. (2018) MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the 2018 IEEE/CVF conference on computer vision and pattern recognition (CVPR2018), pp 4510–4520

  28. Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681

    Article  Google Scholar 

  29. Selvaraju RR, Cogswell M, Das A (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the 2017 IEEE international conference on computer vision (CVPR2017), pp 618–626

  30. Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. Adv Neural Inf Process Syst 27:568–576

    Google Scholar 

  31. Szegedy C, Liu W, Jia Y, et al. (2015) Going deeper with convolutions. In: Proceedings of the 2015 IEEE conference on computer vision and pattern recognition (CVPR2015), pp 1–9

  32. Tan M, Le QV (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: Proceedings of 2019 international conference on machine learning, pp 6105–6114

  33. Tolosana R, Vera-Rodriguez R, Fierrez J et al (2020) Deepfakes and beyond: a survey of face manipulation and fake detection. Inf Fusion 64:131–148

    Article  Google Scholar 

  34. Tran D, Wang H, Torresani L, et al. A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the 2018 IEEE conference on computer vision and pattern recognition (CVPR2018), pp 6450–6459

  35. Verdoliva L (2020) Media forensics and deepfakes: an overview. IEEE J Select Topic Signal Process 14(5):910–932

    Article  Google Scholar 

  36. Yang X, Li Y, Lyu S (2019) Exposing deep fakes using inconsistent head poses. In: Proceedings of 2019 IEEE international conference on acoustics, speech and signal processing (ICASSP2019), pp 8261–8265

  37. Zach C, Pock T, Bischof H (2007) A duality based approach for realtime TV-L1 optical flow. In: Proceedings of the 29th DAGM conference on pattern recognition, pp 214–223

  38. Zaremba W, Sutskever I, Vinyals O (2014) Recurrent neural network regularization. https://arxiv.org/abs/1409.2329.

  39. Zhang K, Zhang Z, Li Z et al (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503

    Article  Google Scholar 

  40. Zhang D, Chen X, Li F et al (2020) Seam-carved image tampering detection based on the cooccurrence of adjacent LBPs. Secur Commun Netw. https://doi.org/10.1155/2020/8830310

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant 62072251, Natural Science Research Project of Jiangsu Universities under Grant 20KJB520021, Higher Vocational Education Teaching Fusion Production Integration Platform Construction Projects of Jiangsu Province under Grant No. 2019(26), the PAPD fund.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Beijing Chen.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kuang, L., Wang, Y., Hang, T. et al. A dual-branch neural network for DeepFake video detection by detecting spatial and temporal inconsistencies. Multimed Tools Appl 81, 42591–42606 (2022). https://doi.org/10.1007/s11042-021-11539-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11539-y

Keywords

Navigation