A dual-branch neural network for DeepFake video detection by detecting spatial and temporal inconsistencies

Kuang, Liang; Wang, Yiting; Hang, Tian; Chen, Beijing; Zhao, Guoying

doi:10.1007/s11042-021-11539-y

A dual-branch neural network for DeepFake video detection by detecting spatial and temporal inconsistencies

1221: Deep Learning for Image/Video Compression and Visual Quality Assessment
Published: 12 July 2022

Volume 81, pages 42591–42606, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Liang Kuang^1,2,
Yiting Wang³,
Tian Hang¹,
Beijing Chen ORCID: orcid.org/0000-0002-2506-0427^1,4 &
…
Guoying Zhao⁵

569 Accesses
6 Citations
1 Altmetric
Explore all metrics

Abstract

It has become a research hotspot to detect whether a video is natural or DeepFake. However, almost all the existing works focus on detecting the inconsistency in either spatial or temporal. In this paper, a dual-branch (spatial branch and temporal branch) neural network is proposed to detect the inconsistency in both spatial and temporal for DeepFake video detection. The spatial branch aims at detecting spatial inconsistency by the effective EfficientNet model. The temporal branch focuses on temporal inconsistency detection by a new network model. The new temporal model considers optical flow as input, uses the EfficientNet to extract optical flow features, utilize the Bidirectional Long-Short Term Memory (Bi-LSTM) network to capture the temporal inconsistency of optical flow. Moreover, the optical flow frames are stacked before inputting into the EfficientNet. Finally, the softmax scores of two branches are combined with a binary-class linear SVM classifier. Experimental results on the compressed FaceForensics++ dataset and Celeb-DF dataset show that: (a) the proposed dual-branch network model performs better than some recent spatial and temporal models for the Celeb-DF dataset and all the four manipulation methods in FaceForensics++ dataset since these two branches can complement each other; (b) the use of optical flow inputs, Bi-LSTM and dual-branches can greatly improve the detection performance by the ablation experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 8

Double-Stream Segmentation Network with Temporal Self-attention for Deepfake Video Detection

Video Forensics for Object Removal Based on Darknet3D

Attention Guided Spatio-Temporal Artifacts Extraction for Deepfake Detection

References

Afchar D, Nozick V, Yamagishi J, et al. (2018) Mesonet: a compact facial video forgery detection network. In: Proceedings of the 2018 IEEE international workshop on information forensics and security (WIFS2018), pp 1–7
Agarwal S, Farid H, Gu Y, et al. (2019) Protecting world leaders against Deep Fakes. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 38–45
Amerini I, Galteri L, Caldelli R, et al. (2019) Deepfake video detection through optical flow based CNN. In: Proceedings of the 2019 IEEE/CVF international conference on computer vision workshops, pp 1205–1207
Barron JL, Fleet DJ, Beauchemin SS et al (1992) Performance of optical flow techniques. Int J Comput Vis 12:43–77
Article Google Scholar
Chen P, Liu J, Liang T, et al. (2020) FSSPOTTER: spotting face-swapped video by spatial and temporal clues. In: Proceedings of the 2020 IEEE international conference on multimedia and expo (ICME2020), pp 1–6
Chen B, Tan W, Coatrieux G et al (2020) A serial image copy-move forgery localization scheme with source/target distinguishment. IEEE Trans Multimedia. https://doi.org/10.1109/TMM.2020.3026868
Article Google Scholar
Chen B, Ju X, Xiao B et al (2021) Locally GAN-generated face detection based on an improved Xception. Inf Sci 572:16–28
Article Google Scholar
Ciftci UA, Demir I, Yin L (2020) FakeCatcher: detection of synthetic portrait videos using biological signals. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2020.3009287
Article Google Scholar
DeepFake Detection Challenge (DFDC). https://ai.facebook.com/datasets/dfdc/
Donahue J, Hendricks LA, Guadarrama S, et al. (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the 2015 IEEE conference on computer vision and pattern recognition (CVPR2015), pp 2625–2634
Ganiyusufoglu I, Ngô LM, Savov N, et al. (2020) Spatio-temporal features for generalized detection of deepfake videos. https://arxiv.org/abs/2010.11844
Gibson JJ (1950) The perception of the visual world. Houghton Mifflin, Boston
Google Scholar
Google FC (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the 2017 IEEE conference on computer vision and pattern recognition (CVPR2017), pp 1800–1807
Guera D, Delp EJ (2018) Deepfake video detection using recurrent neural networks. In: Proceedings of the 15th IEEE international conference on advanced video and signal based surveillance (AVSS2018), pp 1–6
Huang G, Liu Z, Maaten LVD, et al. (2017) Densely connected convolutional networks. In: Proceedings of the 2017 IEEE conference on computer vision and pattern recognition (CVPR2017), pp 2261–2269
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Neural Inf Process Syst 25:1097–1105
Google Scholar
Khalid H, Woo SS (2020) OC-FakeDect: classifying deepfakes using one-class variational autoencoder. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, vol 656–657
Li Y, Lyu S (2018) Exposing deepfake videos by detecting face warping artifacts. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 46–52
Li Y, Chang M, Lyu S (2018) Exposing ai created fake videos by detecting eye blinking. In: The 2018 IEEE international workshop on information forensics and security (WIFS2018), pp. 1–7
Li L, Bao J, Zhang T, et al. (2020) Face x-ray for more general face forgery detection. In: Proceedings of the 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR2020), pp 5001–5010
Li X, Lang Y, Chen Y, et al. (2020) Sharp multiple instance learning for deepfake video detection. In: Proceedings of the 28th ACM international conference on multimedia, vol 1864–1872
Lima OD, Franklin S, Basu S et al (2020) Deepfake detection using spatiotemporal convolutional networks. https://arxiv.org/abs/2006.14749
Matern F, Riess C, Stamminger M (2019) Exploiting visual artifacts to expose deepfakes and face manipulations. In: Proceedings of the 2019 IEEE winter applications of computer vision workshops (WACVW2019), pp 83–92
Nguyen TT, Nguyen CM, Nguyen DT, et al (2019) Deep learning for deepfakes creation and detection. https://arxiv.org/abs/1909.11573.
Rossler A, Cozzolino D, Verdoliva L, et al. (2019) Faceforensics++: learning to detect manipulated facial images. In: Proceedings of the 2017 IEEE/CVF international conference on computer vision, pp 1–11
Sabir E, Cheng J, Jaiswal A, et al (2019) Recurrent convolutional strategies for face manipulation detection in videos. In: Proceedings of the 2018 IEEE/CVF international conference on computer vision workshops, pp 80–87
Sandler M, Howard A, Zhu M, et al. (2018) MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the 2018 IEEE/CVF conference on computer vision and pattern recognition (CVPR2018), pp 4510–4520
Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681
Article Google Scholar
Selvaraju RR, Cogswell M, Das A (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the 2017 IEEE international conference on computer vision (CVPR2017), pp 618–626
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. Adv Neural Inf Process Syst 27:568–576
Google Scholar
Szegedy C, Liu W, Jia Y, et al. (2015) Going deeper with convolutions. In: Proceedings of the 2015 IEEE conference on computer vision and pattern recognition (CVPR2015), pp 1–9
Tan M, Le QV (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: Proceedings of 2019 international conference on machine learning, pp 6105–6114
Tolosana R, Vera-Rodriguez R, Fierrez J et al (2020) Deepfakes and beyond: a survey of face manipulation and fake detection. Inf Fusion 64:131–148
Article Google Scholar
Tran D, Wang H, Torresani L, et al. A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the 2018 IEEE conference on computer vision and pattern recognition (CVPR2018), pp 6450–6459
Verdoliva L (2020) Media forensics and deepfakes: an overview. IEEE J Select Topic Signal Process 14(5):910–932
Article Google Scholar
Yang X, Li Y, Lyu S (2019) Exposing deep fakes using inconsistent head poses. In: Proceedings of 2019 IEEE international conference on acoustics, speech and signal processing (ICASSP2019), pp 8261–8265
Zach C, Pock T, Bischof H (2007) A duality based approach for realtime TV-L1 optical flow. In: Proceedings of the 29th DAGM conference on pattern recognition, pp 214–223
Zaremba W, Sutskever I, Vinyals O (2014) Recurrent neural network regularization. https://arxiv.org/abs/1409.2329.
Zhang K, Zhang Z, Li Z et al (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503
Article Google Scholar
Zhang D, Chen X, Li F et al (2020) Seam-carved image tampering detection based on the cooccurrence of adjacent LBPs. Secur Commun Netw. https://doi.org/10.1155/2020/8830310
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant 62072251, Natural Science Research Project of Jiangsu Universities under Grant 20KJB520021, Higher Vocational Education Teaching Fusion Production Integration Platform Construction Projects of Jiangsu Province under Grant No. 2019(26), the PAPD fund.

Author information

Authors and Affiliations

School of Computer, Nanjing University of Information Science and Technology, Nanjing, 210044, China
Liang Kuang, Tian Hang & Beijing Chen
School of IoT Engineering, Jiangsu Vocational College of Information Technology, Wuxi, 214153, China
Liang Kuang
Warwick Manufacturing Group, University of Warwick, Coventry, CV4 7AL, UK
Yiting Wang
Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CICAEET), Nanjing University of Information Science and Technology, Nanjing, 210044, China
Beijing Chen
Center for Machine Vision and Signal Analysis, University of Oulu, 90014, Oulu, Finland
Guoying Zhao

Authors

Liang Kuang
View author publications
You can also search for this author in PubMed Google Scholar
Yiting Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tian Hang
View author publications
You can also search for this author in PubMed Google Scholar
Beijing Chen
View author publications
You can also search for this author in PubMed Google Scholar
Guoying Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Beijing Chen.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kuang, L., Wang, Y., Hang, T. et al. A dual-branch neural network for DeepFake video detection by detecting spatial and temporal inconsistencies. Multimed Tools Appl 81, 42591–42606 (2022). https://doi.org/10.1007/s11042-021-11539-y

Download citation

Received: 18 May 2021
Revised: 17 August 2021
Accepted: 09 September 2021
Published: 12 July 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s11042-021-11539-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A dual-branch neural network for DeepFake video detection by detecting spatial and temporal inconsistencies

Abstract

Access this article

Similar content being viewed by others

Double-Stream Segmentation Network with Temporal Self-attention for Deepfake Video Detection

Video Forensics for Object Removal Based on Darknet3D

Attention Guided Spatio-Temporal Artifacts Extraction for Deepfake Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A dual-branch neural network for DeepFake video detection by detecting spatial and temporal inconsistencies

Abstract

Access this article

Similar content being viewed by others

Double-Stream Segmentation Network with Temporal Self-attention for Deepfake Video Detection

Video Forensics for Object Removal Based on Darknet3D

Attention Guided Spatio-Temporal Artifacts Extraction for Deepfake Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation