Two birds with one stone: Transforming and generating facial images with iterative GAN
Introduction
Image generation [1], [2], [3] and image transformation [4], [5], [6] are two important topics in computer vision. A popular way of image generation is to learn a complex function that maps a latent vector onto a generated realistic image. By contrast, image transformation refers to translating a given image into a new image with modifications on desired attributes or style. Both of them have wide applications in practice. For example, facial composite, which is a graphical reconstruction of an eyewitness’s memory of a face [7], can assist police to identify a suspect. In most situations, police need to search a suspect with only one picture of the front view. To improve the success rate, it is very necessary to generate more pictures of the target person with different poses or expressions. Therefore, face generation and transformation have been extensively studied.
Benefiting from the successes of the deep learning, image generation and transformation have seen significant advances in recent years [8], [9]. With deep architectures, image generation or transformation can be modeled in more flexible ways than traditional approaches. For example, the conditional PixelCNN [9] was developed to generate an image based on the PixelCNN (a generative model that predict the image distribution pixel by pixel). The generation process of this model can be conditioned on visible tags or latent codes from other networks. However, the quality of generated images and convergence speed need improvement1. In [1] and [10], the Variational Auto-encoders (VAE) [11] was proposed to generate natural images. Recently, Generative adversarial networks (GAN) [12] has been utilized to generate natural images [13] or transform images [4], [6], [14], [15] with conditional settings [16].
The existing approaches can be applied to face generation or face transformation respectively, however, there are several disadvantages of doing so. First, face generation and face transformation are closely connected with a joint distribution of facial attributes while the current models are usually proposed to achieve them separately (face generation [10], [17] or transformation [14], [15], [18]), which may limit the prediction performance. Second, learning facial attributes has been ignored by existing methods of face generation and transformation, which might deteriorate the quality of facial images. Third, most of the existing conditional deep models did not consider to preserve the facial identity during the face transformation [19] or generation [10];
To this end, we propose an iterative GAN with an auxiliary classifier in this paper, which can not only generate high fidelity face images with controlled input attributes, but also integrate face generation and transformation by learning a joint distribution of facial attributes. We argue that the strong coupling between face generation and transformation should benefit each other. And the iterative GAN can learn and even manipulate multiple facial attributes, which not only help to improve the image quality but also satisfy the practical need of editing facial attributes at the same time. In addition, in order to preserve the facial identity, we regularize the iterative GAN by the perceptual loss in addition to the pixel loss. A quantity metric was proposed to measure the face identity in this paper.
To train the proposed model, we adopt a two-stage approach as shown in Fig. 1. In the first stage, we train a discriminator D, a generator G, and a classifier C by minimizing adversarial losses [12] and the label losses as in [3]. In the second stage, G and D/C are iteratively trained with an integrated loss function, which includes a perceptual component [20] between D’s hidden layers in stage 1 and stage 2, a latent code loss between the input noise z and the output noise and a pixel loss between the input real facial images and their corresponding rebuilt version.
In the proposed model, the generator G not only generates a high-quality facial image according to the input attribute (single or multiple) but also translates an input facial image with desired attribute modifications. The fidelity of output images can be highly preserved due to the iterative optimization of the proposed integrated loss. To evaluate our model, we design experiments from three perspectives, including the necessity of the integrated loss, the quality of generated natural face images with specified attributes, and the performance of face generation. Experiments on the benchmark CelebA dataset [21] have indicated the promising performance of the proposed model in face generation and face transformation.
Section snippets
Facial attributes recognition
Object Recognition method has been researched for a long while as an active topic [22], [23], [24] especially for human recognition, which takes attributes of a face as the major reference [25], [26]. Such attributes include but not limited to natural looks like Arched_Eyebrows, Big_Lips, Double_Chin, Male, etc. Besides, some artificial attributes also contribute to this identification job, like Glasses, Heavy_Makeup, Wave_Hair. Even some expression like Smiling, Angry, Sad can be labeled as a
Proposed model
We first describe the proposed model with the integrated loss, then explain each component of the integrated loss.
Network architecture
Iterative GAN includes three neural networks. The generator consists a fully-connected layer with 8584 neurons, four de-convolutional layers and each has 256, 128, 64 and 3 channels with filer size 5*5. All filter strides of the generator G are set as 2*2. After processed by the first fully-connected layer, the input noise z with 100 dimensions is projected and reshaped to [Batch_size, 5, 5, 512], the following 4 deconvolutional layers then transpose the tensor to [Batch_size, 16, 16, 256],
Experiments
We perform our experiment for multiple tasks to verify the capability of iterative GAN model: recognition of facial attributes, face images reconstruction, face transformation, and face generation with controllable attributes.
Conclusion
We propose an iterative GAN to perform face generation and transformation jointly by utilizing the strong dependency between the face generation and transformation. To preserve facial identity, an integrated loss including both the per-pixel loss and the perceptual loss is introduced in addition to the traditional adversarial loss. Experiments on a real-world face dataset demonstrate the advantages of the proposed model on both generating high-quality images and transforming image with
Acknowledgments
This work was partially supported by the Natural Science Foundation of China (61572111, G05QNQR004), the National High Technology Research and Development Program of China (863 Program) (No. 2015AA015408), a 985 Project of UESTC (No. A1098531023601041) and three Fundamental Research Fund for the Central Universities of China (Nos. A03017023701012, JBK120509, and JBK140507),
Dan Ma received the M.Sc. degree in the School of University of Electronic Science & Technology of China, in 2005. He is currently a Ph.D. student in the School of Computer Science and Engineering, University of Electronic Science and Technology of China, China. His research interests are machine learning, data mining,especially interested in GAN.
References (52)
- Y. Sun, Y. Chen, X. Wang, X. Tang, Deep learning face representation by joint identification-verification, Proceedings...
- K. Gregor, I. Danihelka, A. Graves, D. J. Rezende, D. Wierstra, Draw: a recurrent neural network for image generation,...
- C. Wang, C. Wang, C. Xu, D. Tao, Tag disentangled generative adversarial network for object image re-rendering, in:...
- A. Odena, C. Olah, J. Shlens, Conditional image synthesis with auxiliary classifier GANs, Proceedings of the...
- J. Zhu, P. Krahenbuhl, E. Shechtman, A.A. Efros, Generative visual manipulation on the natural image manifold,...
- D. Yoo, N. Kim, S. Park, A.S. Paek, I.S. Kweon, Pixel-level domain transfer, Proceedings of the European Conference on...
- J. Zhu, T. Park, P. Isola, A.A. Efros, Unpaired image-to-image translation using cycle-consistent adversarial networks,...
- et al.
Use of facial composite systems in us law enforcement agencies
Psychol. Crime Law
(2006) - et al.
Learning a deep convolutional network for image super-resolution
European conference on computer vision
(2014) - A.V. Den Oord, N. Kalchbrenner, O. Vinyals, L. Espeholt, A. Graves, K. Kavukcuoglu, Conditional image generation with...
Generative adversarial nets
Advances in neural information processing systems
Age progression/regression by conditional adversarial autoencoder
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Neural face editing with intrinsic image disentangling
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Conditional generative adversarial nets
Generate identity-preserving faces by generative adversarial networks
Visual recognition with humans in the loop
European Conference on Computer Vision
Automated flower classification over a large number of classes
2008 Sixth Indian Conference on Computer Vision, Graphics \& Image Processing
Attribute and simile classifiers for face verification
2009 IEEE 12th International Conference on Computer Vision
Cited by (6)
Generative Adversarial Networks for text-to-face synthesis & generation: A quantitative–qualitative analysis of Natural Language Processing encoders for Spanish
2024, Information Processing and ManagementA comprehensive overview of Deepfake: Generation, detection, datasets, and opportunities
2022, NeurocomputingCitation Excerpt :They also applied a perceptual loss to maintain face identity and a cycle consistency loss to reduce artifacts. Similar to BeautyGAN, the authors in [82–84] utilized different loss functions to control attribute manipulation from different perspectives, such as age, identity, expression, and facial attributes. Nevertheless, BeautyGAN obtained the highest voting rate by 61.84% in the best makeup transfer ranking.
An automated multispecies bioacoustics sound classification method based on a nonlinear pattern: Twine-pat
2022, Ecological InformaticsCitation Excerpt :Sounds are one of the most used communication tools for both people and animals. Many animals generate sounds for communications and by using these sound signals, the behavior of animals can be analyzed (Cordeiro et al., 2018; Ma et al., 2019). Moreover, some behaviors and wishes can be detected using sounds of animals (Ahumada et al., 2011).
Latent Dirichlet allocation based generative adversarial networks
2020, Neural NetworksCitation Excerpt :One main difficulty in training GANs is how to avoid mode collapse whilst affording an efficient evaluation. To account for this, many important variants of GANs (Arjovsky et al., 2015; Brock et al., 2019; Chen et al., 2019; Eghbal-zadeh et al., 2019; Gulrajani et al., 2017; Heim, 2019; Li et al., 2018; Liang et al., 2019; Ma et al., 2020; Tran et al., 2018) have been proposed. Salimans et al. (2016) introduced several techniques into GANs training, including feature matching, minibatch discrimination, historical averaging and virtual batch normalization, to avoid mode collapse.
GraphCWGAN-GP: A Novel Data Augmenting Approach for Imbalanced Encrypted Traffic Classification
2023, CMES - Computer Modeling in Engineering and Sciences
Dan Ma received the M.Sc. degree in the School of University of Electronic Science & Technology of China, in 2005. He is currently a Ph.D. student in the School of Computer Science and Engineering, University of Electronic Science and Technology of China, China. His research interests are machine learning, data mining,especially interested in GAN.
Bin Liu is an Assistant Professor with the Department of Statistics, Southwestern University of Finance and Economics, Chengdu, China. He received the Ph.D. degree in computer science from the University of Electronic Science and Technology of China. His research interests include machine learning and data mining.He was the recipient of the Best Student Paper runner up Award at the 8th Asian Conference on Machine Learning and the best paper candidate at the 33rd IEEE Global Communications Conference.
Zhao Kang obtained his M.S. degree in theoretical physics from Sichuan University, China. He got his Ph.D. degree in 05/2017, in Southern Illinois University Carbondale. Now he isan assistant professor in University of Electronic Science and Technology of China. His major research area is machine learning.
Jiayu Zhou received the Ph.D. degree in computer science from Arizona State University, Tempe, AZ,USA, in 2014. He is an Assistant Professor with the Department of Computer Science and Engineering at Michigan State University, East Lansing, MI, USA. His research interests include large scale machine learningand data mining, and biomedical informatics. Prof. Zho served as Technical Program Committee Member of premier conferences such as NIPS, ICML, and SIGKDD. He was the recipient of the Best Student Paper Award at the 2014 IEEE international conference on data mining (ICDM) and the Best Student Paper Award at the 2016 International Symposium on Biomedical Imaging (ISBI).
Jianke Zhu received the Ph.D. degree in computer science from Zhejiang University in China. He is currently a associate professor in Zhejiang University.His research interests include computer vision and machine learning.
Zenglin Xu received the Ph.D. degree in computer science and engineering from the Chinese University of Hong Kong. He is currently a full professor in University of Electronic Science & Technology of China. He has been working at Michigan State University, Cluster of Excellence at Saarland University and Max Planck Institute for Informatics, and later Purdue University. Dr. Xu’s research interests include machine learning and its applications in information retrieval, health informatics, and social network analysis. He has been elected in the 2013’s China Youth 1000-talent Program. He is the recipient of the outstanding student paper honorable mention of AAAI 2015, the best student paper runner up of ACML 2016, and the 2016 young researcher award from APNNS.