A Survey of GAN-Generated Fake Faces Detection Method Based on Deep Learning

: In recent years, with the rapid growth of generative adversarial networks (GANs), a photo-realistic face can be easily generated from a random vector. Moreover, the faces generated by advanced GANs are very realistic. It is reasonable to acknowledge that even a well-trained viewer has difficulties to distinguish artificial from real faces. Therefore, detecting the face generated by GANs is a necessary work. This paper mainly introduces some methods to detect GAN-generated fake faces, and analyzes the advantages and disadvantages of these models based on the network structure and evaluation indexes, and the results obtained in the respective data sets. On this basis, the challenges faced in this field and future research directions are discussed.

researchers have begun to use the deep learning to learn the main differences between the real faces and the fake faces directly. This paper wants to review and summarize these deep learning-based methods.
In order to better understand the following detection methods, the principle and development of GANs will be explained in Section 2. Then, some methods for detecting fake faces generated by GANs will be described in Section 3. Section 4 describes the possible challenges and possible future research directions. Section 5 concludes the paper.

Principle and Development of GANs
The generative adversarial networks (GANs) was first proposed in 2014 by Goodfellow [9]. The basic model is shown in Fig. 2. One simple idea is using two models to fight each other. One of model is generator G and the other is discriminator D. Then, it needs to input a random noise z obeying the prior probability to the G, and then the G outputs the data as G(z). Finally, the G(z) and the real data P(data x) are added to the discriminator. After the training, the D network judges whether the input data is real data or generated by the generator. The D improves its discriminating ability through continuous learning, and the G makes the data generated by itself more realistic through continuous learning. So as to deceive the discriminator, G and D fight each other, making their ability stronger and stronger, and finally form a relatively stable state. At this point, the discriminator can't recognize the data generated by the generator, and achieve the purpose of falsification. The initial GANs has some flaws. Firstly, the training is unstable. It is difficult to guarantee the synergy between G and D. Secondly, the fake images lack diversity. Finally, it does not have a uniform effective criterion for the quality of the generated image. Therefore, several improvements to GANs have been proposed recently. For example, Radford et al. [10] proposed DCGAN (deep convolutional GAN). DCGAN mainly improves the original GANs in the network structure. It replaces generator and discriminator with two convolutional neural networks, which improved the stability of the network, but it did not solve the problem fundamentally. Mao et al. [11] proposed LSGAN, which mainly changed the objective function of GANs from cross entropy loss to least squares loss. Arjovsky proposed WGAN (Wasserstein GAN) [12], which replaced the JS distance with Wasserstein distance (EM distance). WGAN can fundamentally solve the problem of instability. Gulrajani et al. [13] proposed that WGAN-GP which can improve WGAN. EBGAN [14] and BEGAN [15] can also generate very realistic face pictures. TeroKarra et al. [1] proposed PGGAN (Pro-GAN), PGGAN and StyleGAN [2] used a step-by-step method to generate high-quality, high-resolution images. Other GANs such as CycleGAN [16], StarGAN [17] can perform image-to-image translation and style altering.

Deep Learning-Based Methods for Detecting GAN-Generated Fake Faces
Due to most discriminators and generators used in GANs are mainly based on CNN, it is reasonable to use CNN-based methods to detect the fake faces.

Methods for PGGAN-Generated Fake Faces
PGGAN can generate the fake faces with almost the highest quality and resolution among the existing GAN models. The generated fake faces are very difficult to identify for people. Therefore, many researchers are committed to design the detection method for PGGAN-generated fake faces.
Nhu et al. [18] employed VGG-Net structure for detection. The net-work is shown in Fig. 3. This structure consists of five modules. Each module has a convolutional layer and a max-pooling layer. They are used for feature extraction. Then, the feature maps input to fully-connected layer. Finally, the softmax layer is used to output the possibility of being a true image.  Mo et al. [19] proposed that the main differences between the real faces and the fake faces were reflected in the residual field. So, firstly, they images are processed with high-pass filter. The resulting residuals are input to three-layer groups. Each group includes a convolutional layer and a max-pooling layer. The out-feature maps of the last group are aggregated and input to two fully-connected. Finally, the softmax layer is used to output the possibility of being a true image.
The results of the above two detection methods are shown in Tab. 1. The performance of the second method is better obviously.

General Methods for GAN-Generated Fake Faces
In most cases, people don't know the source of fake images from which GAN model. So, it is desirable to design a general method for detecting fake faces generated by most of GAN models.
Hsu et al. [20] proposed the deep forgery discriminator (Deep FD) structure, as shown in Fig. 4. It can detect images generated by various GANs, and is no longer limited to images generated by a particular GANs. The paired data is input into the feature extraction network. The network updates the parameters of the network through the contrast-loss [21]. In this way, the extracted features will be more perfect. In the feature extraction network, joint discriminative feature learning method is employed (Jointly Discriminative Feature Learning). After the features are extracted, these feature maps are fed into the discriminator to detect the authenticity of the image.
During training, the author used CelebA as the data set of the real face and used the false face generated by DCGAN, LSGAN, WGAN Figure 4: Framework proposed in [20] Zhuang et al. [22] proposed a coupled network with two-step pairwise learning. As shown in Fig. 5, the idea is similar to the idea of the above article. The author of this article believes that the contrast-loss is not stable, so the author used triplet-loss [23] to update the parameters of the network. The author also proposed that since the fake faces generated by different GANs may have different characteristics, the original one-stream CNN cannot extract all the features. To solve this problem, the paper proposed to use the coupled deep neural network (CDNN). The network includes a 3 × 3 convolution kernel and a 5 × 5 size convolution kernel. The 3 × 3 convolution kernel is used to extract local false face features, and the 5 × 5 size convolution kernel is used to capture the global fake face features. Finally, these feature maps are input into a classifier that includes two fully connected layers to obtain the final prediction result.

Predicted label
Step  The results of the above two methods are shown in Tab. 2. Precision is the proportion of the correct sample predicted in the set of samples that are predicted to be positive. Recall is the proportion of the correct sample predicted in the actual positive sample. It can be seen from the results that the two detection methods not only can detect images generated by different GANs but also have better performance. By studying the structure of GANs, References [24][25] found some invisible differences between real faces and GAN-generated fake faces. The differences of the color spaces were analyzed, namely, RGB, HSV and YCbCr. The chi-square distance was used to evaluate the difference between the image statistics generated by the GANs and the real image statistics. The larger the chi-square distance, the more obvious the difference. So, researchers began try to extract features from the color spaces.
In [26], the features were extracted from the color spaces and then input into the network for detection. The framework is shown in Fig. 6. This structure used the co-occurrence matrix [27] widely used in image texture analysis as feature descriptors. For a given image, the features of the color components were first calculated and then connected into a feature vector. because high-frequency filtering can be better captured by high-pass filtering. So, during the feature extraction process, the image was processed by high-pass filtering. Finally, a classifier was trained to detect whether the input image is real or generated by the GANs. In the selection of the training data set, the real face selected CelebA and the fake face selected the fake face generated by DCGAN, W-GAN, and PGGAN. After training, the final test results were ideal, and higher accuracy can be obtained, which can reach more than 95%.

Concate nate Classifier
Real Fake Image

Color components Residuals
Vectors of co-occurrences Decision Figure 6: Framework proposed in [26] The above methods do not consider the robustness of the model. In real life, most of the pictures transmitted on the network are post-processed pictures (such as JPEG compression, Gaussian noise, etc.). This increases the difficulty of detection.
He et al. [28] proposed that common post-processing attacks will make the abnormal traces in the RGB space unreliable, while the statistical characteristics of chrominance information in other color spaces may be more distinguishable and robust. Therefore, the author exploited a well-designed shallow CNN to extract the features of the chrominance component, and then Random Forest (RF) [29] for classification. The flowchart of this method is shown in Fig. 7. Finally, the author did experiments on six attack methods such as JPEG compression, Gaussian noise, and bilateral filtering, and achieved very good results, proving that the model has good robustness. Through experimental and theoretical analysis, Liu et al. [30] got the conclusion that CNN mainly extracts regions with rich texture (such as skin) when performing fake face detection. Then, through further experiments they came to this conclusion that the global texture can effectively improve the robustness of the result. The main architecture of this method is shown in Fig. 8. The author uses the gram matrix to extract the global image texture feature and then detects it. Finally, the author tested the robustness and generalization of the model, and obtained very good results. 3×3

Challenges and Future Research Directions
The main challenges in the field of detecting fake face images generated by GANs and the problems to be solved are as follows: (i) Generalization of detectors: The development of GANs is rapid and new GANs may appear in the future. Detectors should detect images generated by new GANs. So, researchers can consider improving the generalization of detectors.
(ii) Robustness of detectors: The fake faces may be compressed on online me-dia. So, researchers can consider improving the robustness against compression for the detectors. (iii) Mobile device detection: Due to the large amount of computation, existing detectors are not suitable for mobile applications. Researchers can consider improving the computational complexity of the detectors. (iv) Large data set: Currently, there are few large data sets available to the public.

Conclusion
This paper mainly introduces several detection methods for detecting fake faces generated by GANs. From the results, it can be concluded these detection methods have high accuracy. Although the GANs are developing faster and faster. Various GANs may emerge in the future. They may generate higher quality fake faces. But we can get inspiration from the above methods. The GANs cannot completely describe many intrinsic properties of real images, such as differences in color spaces. So, trying to find the differences between fake images and true images. Then, according to the differences develop more advanced detectors.
Funding Statement: This work is supported by National Natural Science Foundation of China (62072251).

Conflicts of Interest:
We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.