A Novel Defensive Strategy for Facial Manipulation Detection Combining Bilateral Filtering and Joint Adversarial Training

Facial manipulation enables facial expressions to be tampered with or facial identities to be replaced in videos.-e fake videos are so realistic that they are even difficult for human eyes to distinguish. -is poses a great threat to social and public information security. A number of facial manipulation detectors have been proposed to address this threat. However, previous studies have shown that the accuracy of these detectors is sensitive to adversarial examples. -e existing defense methods are very limited in terms of applicable scenes and defense effects. -is paper proposes a new defense strategy for facial manipulation detectors, which combines a passive defense method, bilateral filtering, and a proactive defense method, joint adversarial training, to mitigate the vulnerability of facial manipulation detectors against adversarial examples. -e bilateral filtering method is applied in the preprocessing stage of the model without any modification to denoise the input adversarial examples. -e joint adversarial training starts from the training stage of the model, which mixes various adversarial examples and original examples to train the model. -e introduction of joint adversarial training can train a model that defends against multiple adversarial attacks. -e experimental results show that the proposed defense strategy positively helps facial manipulation detectors counter adversarial examples.


Introduction
Facial manipulation refers to swapping the target face with the source face, containing both forms of identity exchange and expression exchange. Identity exchange means swapping the entire face of the target and the source characters, which can change the identity. e expression exchange, on the other hand, only changes the facial expression but does not change the identity. Figure 1 shows two examples of realworld applications of facial manipulation [1,2]. ese two examples in Figure 1 reveal the threat posed by facial manipulation videos to the field of security detection. With the continuous evolvement of facial manipulation, the techniques required for facial manipulation methods are becoming cheaper; the training data sets required are becoming smaller; and the resulting fake videos are becoming more realistic. Timely and effective countermeasures are needed; otherwise, the consequences will be unbearable.
To cope with the threat posed by facial manipulation videos, a number of facial manipulation detectors have been proposed. Facial manipulation detectors can be grouped into two broad categories. One is based on the manual feature extraction [3][4][5], and the other is on various deep neural networks [6][7][8][9][10]. Compared with traditional feature extraction methods, deep neural network-based methods generally have better detection performance. However, existing deep neural network-based facial manipulation detection models [11][12][13] are highly vulnerable to adversarial attacks. Obviously, the deep neural network models of facial manipulation detection have security vulnerabilities. Because of the security problem of detection, there are five adversarial attack methods are used in the paper [13] to attack two kinds of networks [3,14]. Gandhi and Jain [11] used the classic fast gradient sign method (FGSM) [15] and the Carlini and Wagner Attack (C&W) [16] methods against facial manipulation detectors, ResNet [17] and VGG [18]. Neekhara et al. [12] performed white-and black-box attacks on XceptionNet [19] and MesoNet [7] using a gradient signbased approach for perturbation optimization. If these models are applied, they will inevitably entail huge risks and irreparable losses. erefore, it is necessary to improve the security of facial manipulation detectors. However, Gandhi and Jain [11] only applied two defense methods, Lipschitz regularization and deep image prior (DIP), to enhance the security of face swap detectors. And the accuracy that these two methods can improve is very limited. ere is no other defense against adversarial attacks. erefore, the research on defense strategies for facial manipulation detectors needs to be further explored.
To address the above problems, this paper proposes a new defense strategy for facial manipulation detectors. is strategy designs effective methods to defend against adversarial example attacks from passive and proactive defenses. Passive defense means defending against the adversarial attacks without modifying the structure and parameters of the model, while proactive defense needs to train a new model so that the new model can exhibit strong robustness against the attacks. e defense strategy proposed in this paper is a passive defense method based on bilateral filtering and a proactive defense method based on joint adversarial training. e contributions of this work can be summarized as follows: (1) ree adversarial attack methods, basic iterative method (BIM) [20], projected gradient descent (PGD) [21], and fast FGSM (FFGSM) [22], have been used to perform white-and black-box attacks on the trained Xception model [19]. (2) e bilateral filtering method is introduced as the passive defense method in our proposed defense strategy. It does not require additional model training and only simple processing of input data in the data preprocessing stage. To the best of our knowledge [11], we are the first ones to propose a passive defense method for facial manipulation detectors. (3) e joint adversarial training method is the proactive defense method in our proposed defense strategy. is method enables facial manipulation detectors to have the ability to defend against multiple adversarial attacks.
e results show the state-of-art performance for the security for the facial manipulation detectors.

Facial Manipulation Methods.
In terms of identity exchange, classic facial manipulation algorithms include DeepFakes [23], FaceSwap [24], FSGAN [25], FaceShifter [26], and so forth. Among them, DeepFakes [23] was designed based on VAE (variational autoencoder) and GAN (generative adversarial networks). e idea of this algorithm was based on the unsupervised image-to-image transformation proposed by Liu et al. [27]. However, this method needs to train a model with a large number of facial images specifically for both target and source characters, which is very time-consuming. FaceSwap [24] was based on the traditional method of facial region extraction and exchange, which was more lightweight compared to DeepFakes. FSGAN [25] was a recursive neural network-based face reconstruction algorithm. It can transform the target face based on the source face on the pretrained face model with high efficiency and convenience. FaceShifter [26] can fully extract and adaptively integrate target attributes to generate a realistic face by extracting multilevel target face attributes and adding adaptive attentional denormalization layers. It also introduces a heuristic error acknowledging refinement network that can effectively solve the face occlusion problem.
In terms of expression exchange, ies et al. [28] used an RGB-D camera to track and reconstruct 3D models of two people's faces and realized facial reconstruction and expression exchange. Face2Face [29] optimized the expression exchange algorithm by combining 3D reconstruction and video reproduction technology.
ies et al. [30] further proposed another neural texture method; this method made the expression exchange more realistic and natural by rerendering the 3D content.
With the improvement of the reconstruction quality of the facial manipulation algorithms, the generated fake videos have reached the level of genuine ones. It is hard to identify accurately the authenticity of these videos with the human eye alone.
is poses a threat to the information security of society and the public. If effective measures are not taken, it will certainly bring great harm to social security and stability. erefore, facial manipulation video detection technology is a hot research content with certain social and practical value.

Facial Manipulation Detection.
Facial manipulation detection algorithms have been developed to address the threat from facial manipulation videos. Some of the researchers detect fake videos by manual feature extraction. Li et al. [31] detected facial manipulation videos based on the biological signal of blink in the videos. Yang et al. [32] suggested that by estimating the 3D head pose in facial images, combined with SVM (support vector machine), classifier can effectively detect fake videos. Amerini et al. [33] used the optical flow method to detect facial manipulation videos. Durall et al. [34] found that facial manipulation videos can be detected by simple frequency domain analysis only.
On the other hand, deep neural networks have also been applied in detecting face-manipulated videos. Li et al. [35] proposed to use the convolutional neural network (CNN) to detect artifacts generated during facial manipulation. Mesonet [7], inspired by InceptionNet, effectively detected fake videos generated by DeepFakes [23] and Face2Face [29]. Nguyen et al. [10] proposed to use the capsule network [36] for facial manipulation video detection, which can effectively detect multiple types of fake videos. In [37], the authors used Xception to detect face-manipulated videos and showed excellent results, so Xception is also one of the basic models used by various new methods for comparison [38][39][40][41]. Compared with the methods based on artificial features, the methods based on the deep neural network generally have higher detection accuracy [42][43][44].

Adversarial Attacks and Defenses.
e adversarial attack [15] is when an attacker generates a corresponding adversarial example [45] by maliciously adding a small perturbation to the original example. Such perturbations are not only undetectable by human eyes but also can lead to misclassification of trained models. Deep neural networks and many other pattern recognition models were found to be vulnerable to adversarial attacks in previous studies [15]. It is also verified that in the field of facial manipulation detection, various methods are also vulnerable to attacks from adversarial examples. Neekhara et al. [12] used the L ∞ distortion metric as the constraint for adding perturbations and optimized it using a gradient sign-based approach. ey studied robust attack methods for facial manipulation detection networks from two aspects of white-and black-box attacks. Gandhi and Jain [11] used the classical FGSM and CW attack methods to attack VGG [18] and ResNet [17], which found that the classification accuracy of the two networks was reduced to less than 27%. By modifying the potential space of the generator, Carlini and Farid [13], respectively, carried out white-and black-box attacks against two kinds of networks [3,14]. Similarly, the accuracy of classification is obviously reduced by this method. Most of the above methods apply classical adversarial attack methods, such as FGSM [15], which are prone to generate visible noise when generating adversarial examples, while methods such as CW [16] are less efficient in attacking. In this paper, we use three improved methods, BIM [20], PGD [21], and FFGSM [22], to attack the facial manipulation detectors. ese three methods not only generate less noisy adversarial examples but also are more efficient and thus more difficult to defend.
In order to defend against adversarial attacks effectively, a variety of defense methods have been investigated. Wang et al. [46] [15] first proposed that adversarial training could be performed by adding adversarial examples to the training set to enhance the robustness of the models. Bhagoji et al. [48] proposed to use dimension-reduction techniques such as principal component analysis (PCA) for defense. However, in the field of facial manipulation detection, only Gandhi and Jain [11] applied Lipschitz regularization [49] and depth image prior (DIP) [50] to resist the attacks of adversarial examples.
e Lipschitz regularization method [49] enhances the robustness of the models against adversarial attacks by constraining the gradient of the detector with respect to the inputs. However, this method is limited by the gradient calculation method and can only be applied to part of the networks (only to ResNet in the paper). Moreover, the accuracy improvement of this method is very limited (only a 10% improvement in detection accuracy), and there is a bottleneck in the application of real scenarios. e DIP method [50] was an unsupervised technique to eliminate interference by preprocessing the data before feeding it to the classifier. However, the optimal accuracy can only be achieved after 6,000 iterations of the model, so it is very time-consuming. e defense methods mentioned above are only considered from the perspective of proactive defense of the models, which suffered from time-consuming, limited application scenarios and accuracy improvement. In this paper, we propose a new defense strategy.
is strategy designs two effective defense methods, the bilateral filtering and the joint adversarial training, from the perspectives of both passive and proactive defenses, respectively.

Adversarial Attack.
Adversarial attacks against machine learning models can be classified into two types, namely whiteand black-box attacks. e classification of the two depends on whether the attacker has access to the prior knowledge of the models. Specifically, adversarial examples are generated by model A to attack model B. If both are the same model, it is a white-box attack; otherwise, it is a black-box attack.
In this paper, we use the following three typical adversarial attack methods on facial manipulation detectors. e basic iterative method (BIM) [20] is also known as the iterative FGSM (I-FGSM) algorithm. Compared with the classical FGSM, this method uses an iterative approach to find the perturbations of each pixel, rather than making all the pixels change greatly at once. BIM can effectively reduce the disturbance noise. e projected gradient descent (PGD) [21] is also an iterative implementation of the FGSM algorithm. However, compared with BIM, PGD further increases the number of iterations and adds a layer of randomization, which was initialized with uniform random noise. PGD is very effective against both linear and nonlinear models. And it is one of the most powerful first-order attack methods available.
Wong et al. [22] proposed fast FGSM (FFGSM) attack method, which was used in the fast adversarial training method using the FGSM attack. Compared with the traditional FGSM algorithm, this method combines random initialization. rough simple random initialization operation, FFGSM not only can accelerate the generation of adversarial examples but also can have a strong attack effect. Figure 2 shows the framework of the adversarial attack process in this paper. We use two types of attacks on the facial manipulation detectors, white-and black-box, respectively. In this case, the target model is Xception [19], and the substitute model in the black-box attack is Meso-Inception [7]. It is important to note that both the white-and the black-box attacks occur in the testing stage of the model. is means that both the substitute and the target models are trained models, and both are trained under the same training set to ensure the transferability of the generated adversarial examples.
In this paper, we propose a new defense strategy that can effectively defend against adversarial attacks for facial manipulation detectors. Specifically, we use bilateral filtering as the passive defense method and the joint adversarial training as the proactive defense method. We will describe the proposed approach in detail in the next section. e overall framework of the defense strategy proposed in this paper is shown in Figure 3.

Passive Defense.
In this paper, we use the bilateral filtering method as the passive defense method. Passive defense occurs in the preprocessing stage of the model on the inputs, and it is very simple and effective to enhance the robustness of the models without retraining.
Bilateral filtering is a method of spatial smoothing. Its main purpose is to deal with image noise reduction. A bilateral filter is a kind of nonlinear filter.
is filter is a combination of spatial proximity of images and similarity of pixel values. e bilateral filtering method considers both spatial proximity information and color similarity information. It removes noise and smoothes the image while maintaining edge detail. e formula is as follows: where (i, j) is the corresponding pixel position, g(i, j) denotes the output image, f(k, l)denotes the input image, and w(i, j, k, l) is the value calculated by the two Gaussian functions. e basic idea of the bilateral filter is that the weights calculated by spatial proximity and those calculated by pixel similarity are multiplied, and then the weights are convolved with the image to achieve the effect of keeping the edges to remove noise. rough comparative experiments, we found that the bilateral filtering method not only can retain the edge information of the image well but also has the best performance in enhancing the robustness of the detector. As shown in Figure 4, compared with the original images, the images processed by median filtering, mean filtering, and Gaussian filtering show the phenomenon of edge blurring. However, the images processed by bilateral filtering can effectively retain the edge information while denoising. e specific defense process of the bilateral filtering method is shown in the green dotted box in Figure 3. e attacker generates adversarial examples in the test stage of the target model. We add a bilateral filter in the data preprocessing stage so that the adversarial examples can pass the bilateral filter, so as to achieve noise reduction. e noise-reduced examples are then fed into the target model so that the target model can resist the attack of the adversarial examples.

Proactive Defense.
In this paper, we use joint adversarial training as the active defense method. Its principle can be summarized as follows: where X is the input of the data, δ denotes the perturbation superimposed on the input, f θ is the neural network function, and y is the label of example. L(f θ (X + δ), y) is the loss obtained by superimposing a perturbation δ on example X and then comparing it with the label Y through the neural network function.max(L) denotes the optimization goal, namely to find perturbation to maximize the loss function. e outer layer of formula (2) is the minimization formula optimized for the neural network, that is, when the perturbation is fixed, the training neural network model minimizes the loss of training data. In other words, the model has been made robust enough to accommodate such perturbations. e adversarial training method can only be trained for a particular adversarial attack method. After training, the model can only defend against this kind of adversarial attack method and cannot defend against other kinds of adversarial attack methods. Aiming at the defect of traditional adversarial training, this paper designs a proactive defense method that can defend against multiple adversarial attack methods, that is, joint adversarial training method. e framework of the joint adversarial training method is shown in the orange dotted box in Figure 3

Experiment Setting.
In the experiment, we use the FaceForensics++ benchmark [3] as the data set for model training and testing. FaceForensics++ contains 1,000 real videos and 5,000 fake videos generated by five facial  manipulation algorithms, with the videos divided into three quality levels. We used three subsets of FaceForensics++. ese three subsets are facial manipulation videos generated by three facial manipulation algorithms, namely DeepFakes (DF), Face2Face (F2F), and FaceSwap (FS). Each subset contains 1,000 fake videos. e data set has 1,000 real videos crawled from YouTube. We divided each subset of videos into the training set, testing set, and validation set in the ratio of 7:2:1, respectively, that is, they consist of 700, 200, and 100 real videos and fake videos generated by their corresponding facial manipulation algorithms. en, we randomly extract some frames from the video. e face detector in the Dlib library is used to extract the images of the face region, which are used as the inputs for model training, validation, and testing. For the models, we use the Xception [4] as the target model for attack and defense and the MesoInception [5] as the substitute model for the Xception.
During the white-box attack experiments, we first pretrained the target model using three subsets of Face-Forensics++ separately. en, in the test stage, BIM [6], PGD [7], and FFGSM [8] are, respectively, used to carry out white-box attacks on the trained target model and generate adversarial examples. In addition, we set the attack intensity of BIM, PGD, and FFGSM as ε � 1/255, ε � 1/255, and ε � 1/255, respectively. During the black-box attack experiments, we first pretrained the target and substitute models using the same three subsets of FaceForensics++. en three adversarial attack methods were used to carry out white-box attacks on the trained substitute model to generate adversarial examples. Since the black-box transfer attack is less effective than the white-box attack, we appropriately increase the attack intensity when carrying out the white-box attack on the substitute model to ensure the success of the black-box attack. Specifically, we increased the attack intensity of BIM, PGD, and FFGSM to ε � 8/255, ε � 8/255, and ε � 8/255, respectively. en, the black-box attack is carried out on the target model using the generated adversarial examples. e white-box attack effects of the three attack methods are shown in Figure 5. In this figure, rows represent the type of adversarial attacks to which the images are subjected, and columns represent the type of facial manipulation methods to which the images are subjected, where the first row is the original images that have not been attacked, and the first column is the real images. It can be seen that adversarial examples generated by the three attack methods all have minimal noise. e original examples and the adversarial examples cannot be distinguished by naked eyes only.
For the passive defense experiment, we carried out a comparative experiment of filtering in the test stage of the target model. In other words, the adversarial examples generated by attacking the target model are input into the target model with and without bilateral filtering to test its defensive performance. In the experiment, we set the neighborhood diameter of bilateral filtering as 9, and the standard deviations of spatial Gaussian function and gray similarity Gaussian function are both 75. For the proactive defense experiment, three attack methods, BIM, PGD, and FFGSM, are used in the training process of the target model to carry out joint adversarial training, that is, three adversarial examples are generated simultaneously in the training process. And the generated three adversarial examples are mixed with the original examples as the training set of the target model for joint adversarial training. After the training is completed, in the test stage, three kinds of adversarial attack methods are used to attack the model. e test evaluated the robustness of the target model that defends against adversarial attacks after joint adversarial training.

Results and Discussion
We first tested the model accuracy of the trained Xception model under the original examples, and the results are shown in Tables 1 and 2. We can see that the Xception model has great detection accuracy under all three unperturbed subsets of the FaceForensics++ data set. When we apply the three adversarial attack methods to white-and black-box attacks, we can find that the performance of the target model will decline sharply.
Next, we will show the defensive performance of the bilateral filtering method and joint adversarial training method in resisting white-and black-box attacks, respectively, from the perspective of passive and proactive defenses.

Passive Defense.
In the experiment, we first test the performance of four spatial smoothing methods, namely mean filtering, median filtering, Gaussian filtering, and bilateral filtering, in resisting the white-box attack. In Table 1, boldfaced numbers indicate the best precision indexes. From Table 1, the four kinds of spatial filtering can improve the robustness of the target model that defends against white-box attacks. However, the bilateral filtering method shows the optimal effect under all kinds of adversarial attacks. Next, we further experiment with the effect of bilateral filtering defend against black-box attack. e results are shown in Table 2. Similarly, from Table 2, it can be found that the introduced bilateral filtering can well improve the robustness of the target model against black-box attack.
When performing adversarial attacks, attackers usually try to introduce noise perturbation that is difficult to detect with naked eyes, causing the target model to misclassify. From the results of Tables 1 and 2, the proposed passive defense method based on bilateral filtering can effectively perform noise reduction on the input images. e perturbations generated by the attacker are counteracted, thus rendering the adversarial attack ineffective.
As we all know, the detection of existing facial manipulation detectors mostly relies on artifacts in the image, and the bilateral filtering method proposed in this paper is mainly used to denoise the image. erefore, after the experiment, we conducted statistics on the detection of the samples. We found that the bilateral filtering method proposed in this paper does not increase the number of falsepositive and false-negative samples.

Proactive Defense.
One criterion for evaluating the performance of adversarial training is that the model after adversarial training is not only an effective defense against adversarial attacks but also has great classification accuracy for the original examples. At the same time, for our proactive defense experiment, we expect the model to have a good defense against all three adversarial attack methods after joint adversarial training. As shown in Tables 3 and 4, the Xception model after joint adversarial training shows good accuracy against all three white-and black-box attacks while maintaining the classification accuracy of the original examples as much as possible. It can be seen that the use of joint adversarial training as a proactive defense method effectively improves the robustness of the Xception model.
Finally, we test the performance of the two existing defense methods (the Lipschitz regularization method and the deep image prior method) and the proposed two defense methods against unseen adversarial attacks. To be specific, we used the FGSM attack method to carry out white-and black-box attacks on the Xception model and then used four defense methods against the attack. e experimental results are shown in Table 5. It can be seen from the data in the table that the two defense methods proposed in this paper still

Conclusions
Various facial manipulation detectors have been introduced to address the security issues associated with facial manipulation. However, most existing models are vulnerable to adversarial example attacks and obviously have security vulnerabilities. In this paper, a new defense strategy for facial manipulation detectors has been proposed to address the vulnerability of detectors defend against adversarial example attacks. Specifically, two defense methods, bilateral filtering and joint adversarial training, are introduced from both passive and proactive defenses. e bilateral filtering method can be used instantly without any modification to the model, which is very convenient and effective. While the joint adversarial training method can effectively defend against multiple adversarial attacks to make the facial manipulation detection model have better robustness. e effectiveness of the two methods is demonstrated through various comparative experiments as well as analyses. e reasons for the defense failure of a small number of adversarial examples are also analyzed qualitatively from the example perspective. For future work, we will continue to address the reasons for defense failures. And powerful defense methods will be introduced to make facial manipulation detection models more robust.

Data Availability
e data set used to support the findings of this study can be obtained by contacting the authors of [37].

Conflicts of Interest
e authors of this paper declare that there are no conflicts of interest regarding the publication of this paper.