Abstract
It has become a research hotspot to detect whether a video is natural or DeepFake. However, almost all the existing works focus on detecting the inconsistency in either spatial or temporal. In this paper, a dual-branch (spatial branch and temporal branch) neural network is proposed to detect the inconsistency in both spatial and temporal for DeepFake video detection. The spatial branch aims at detecting spatial inconsistency by the effective EfficientNet model. The temporal branch focuses on temporal inconsistency detection by a new network model. The new temporal model considers optical flow as input, uses the EfficientNet to extract optical flow features, utilize the Bidirectional Long-Short Term Memory (Bi-LSTM) network to capture the temporal inconsistency of optical flow. Moreover, the optical flow frames are stacked before inputting into the EfficientNet. Finally, the softmax scores of two branches are combined with a binary-class linear SVM classifier. Experimental results on the compressed FaceForensics++ dataset and Celeb-DF dataset show that: (a) the proposed dual-branch network model performs better than some recent spatial and temporal models for the Celeb-DF dataset and all the four manipulation methods in FaceForensics++ dataset since these two branches can complement each other; (b) the use of optical flow inputs, Bi-LSTM and dual-branches can greatly improve the detection performance by the ablation experiments.
Similar content being viewed by others
References
Afchar D, Nozick V, Yamagishi J, et al. (2018) Mesonet: a compact facial video forgery detection network. In: Proceedings of the 2018 IEEE international workshop on information forensics and security (WIFS2018), pp 1–7
Agarwal S, Farid H, Gu Y, et al. (2019) Protecting world leaders against Deep Fakes. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 38–45
Amerini I, Galteri L, Caldelli R, et al. (2019) Deepfake video detection through optical flow based CNN. In: Proceedings of the 2019 IEEE/CVF international conference on computer vision workshops, pp 1205–1207
Barron JL, Fleet DJ, Beauchemin SS et al (1992) Performance of optical flow techniques. Int J Comput Vis 12:43–77
Chen P, Liu J, Liang T, et al. (2020) FSSPOTTER: spotting face-swapped video by spatial and temporal clues. In: Proceedings of the 2020 IEEE international conference on multimedia and expo (ICME2020), pp 1–6
Chen B, Tan W, Coatrieux G et al (2020) A serial image copy-move forgery localization scheme with source/target distinguishment. IEEE Trans Multimedia. https://doi.org/10.1109/TMM.2020.3026868
Chen B, Ju X, Xiao B et al (2021) Locally GAN-generated face detection based on an improved Xception. Inf Sci 572:16–28
Ciftci UA, Demir I, Yin L (2020) FakeCatcher: detection of synthetic portrait videos using biological signals. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2020.3009287
DeepFake Detection Challenge (DFDC). https://ai.facebook.com/datasets/dfdc/
Donahue J, Hendricks LA, Guadarrama S, et al. (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the 2015 IEEE conference on computer vision and pattern recognition (CVPR2015), pp 2625–2634
Ganiyusufoglu I, Ngô LM, Savov N, et al. (2020) Spatio-temporal features for generalized detection of deepfake videos. https://arxiv.org/abs/2010.11844
Gibson JJ (1950) The perception of the visual world. Houghton Mifflin, Boston
Google FC (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the 2017 IEEE conference on computer vision and pattern recognition (CVPR2017), pp 1800–1807
Guera D, Delp EJ (2018) Deepfake video detection using recurrent neural networks. In: Proceedings of the 15th IEEE international conference on advanced video and signal based surveillance (AVSS2018), pp 1–6
Huang G, Liu Z, Maaten LVD, et al. (2017) Densely connected convolutional networks. In: Proceedings of the 2017 IEEE conference on computer vision and pattern recognition (CVPR2017), pp 2261–2269
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Neural Inf Process Syst 25:1097–1105
Khalid H, Woo SS (2020) OC-FakeDect: classifying deepfakes using one-class variational autoencoder. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, vol 656–657
Li Y, Lyu S (2018) Exposing deepfake videos by detecting face warping artifacts. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 46–52
Li Y, Chang M, Lyu S (2018) Exposing ai created fake videos by detecting eye blinking. In: The 2018 IEEE international workshop on information forensics and security (WIFS2018), pp. 1–7
Li L, Bao J, Zhang T, et al. (2020) Face x-ray for more general face forgery detection. In: Proceedings of the 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR2020), pp 5001–5010
Li X, Lang Y, Chen Y, et al. (2020) Sharp multiple instance learning for deepfake video detection. In: Proceedings of the 28th ACM international conference on multimedia, vol 1864–1872
Lima OD, Franklin S, Basu S et al (2020) Deepfake detection using spatiotemporal convolutional networks. https://arxiv.org/abs/2006.14749
Matern F, Riess C, Stamminger M (2019) Exploiting visual artifacts to expose deepfakes and face manipulations. In: Proceedings of the 2019 IEEE winter applications of computer vision workshops (WACVW2019), pp 83–92
Nguyen TT, Nguyen CM, Nguyen DT, et al (2019) Deep learning for deepfakes creation and detection. https://arxiv.org/abs/1909.11573.
Rossler A, Cozzolino D, Verdoliva L, et al. (2019) Faceforensics++: learning to detect manipulated facial images. In: Proceedings of the 2017 IEEE/CVF international conference on computer vision, pp 1–11
Sabir E, Cheng J, Jaiswal A, et al (2019) Recurrent convolutional strategies for face manipulation detection in videos. In: Proceedings of the 2018 IEEE/CVF international conference on computer vision workshops, pp 80–87
Sandler M, Howard A, Zhu M, et al. (2018) MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the 2018 IEEE/CVF conference on computer vision and pattern recognition (CVPR2018), pp 4510–4520
Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681
Selvaraju RR, Cogswell M, Das A (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the 2017 IEEE international conference on computer vision (CVPR2017), pp 618–626
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. Adv Neural Inf Process Syst 27:568–576
Szegedy C, Liu W, Jia Y, et al. (2015) Going deeper with convolutions. In: Proceedings of the 2015 IEEE conference on computer vision and pattern recognition (CVPR2015), pp 1–9
Tan M, Le QV (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: Proceedings of 2019 international conference on machine learning, pp 6105–6114
Tolosana R, Vera-Rodriguez R, Fierrez J et al (2020) Deepfakes and beyond: a survey of face manipulation and fake detection. Inf Fusion 64:131–148
Tran D, Wang H, Torresani L, et al. A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the 2018 IEEE conference on computer vision and pattern recognition (CVPR2018), pp 6450–6459
Verdoliva L (2020) Media forensics and deepfakes: an overview. IEEE J Select Topic Signal Process 14(5):910–932
Yang X, Li Y, Lyu S (2019) Exposing deep fakes using inconsistent head poses. In: Proceedings of 2019 IEEE international conference on acoustics, speech and signal processing (ICASSP2019), pp 8261–8265
Zach C, Pock T, Bischof H (2007) A duality based approach for realtime TV-L1 optical flow. In: Proceedings of the 29th DAGM conference on pattern recognition, pp 214–223
Zaremba W, Sutskever I, Vinyals O (2014) Recurrent neural network regularization. https://arxiv.org/abs/1409.2329.
Zhang K, Zhang Z, Li Z et al (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503
Zhang D, Chen X, Li F et al (2020) Seam-carved image tampering detection based on the cooccurrence of adjacent LBPs. Secur Commun Netw. https://doi.org/10.1155/2020/8830310
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grant 62072251, Natural Science Research Project of Jiangsu Universities under Grant 20KJB520021, Higher Vocational Education Teaching Fusion Production Integration Platform Construction Projects of Jiangsu Province under Grant No. 2019(26), the PAPD fund.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kuang, L., Wang, Y., Hang, T. et al. A dual-branch neural network for DeepFake video detection by detecting spatial and temporal inconsistencies. Multimed Tools Appl 81, 42591–42606 (2022). https://doi.org/10.1007/s11042-021-11539-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11539-y