Abstract
Image to Image generation has been a widely explored research topic in recent times. A relatively less explored topic involves multi-modal translation, i.e. Text-to-Image translation. In daily life, people use descriptions for identifying persons. The perception of faces varies from person to person. Different people may focus on different attributes of the face, which may be a non-physical feature too. With the modern computer vision techniques, smart devices could also be empowered to perceive humans and different objects in its surroundings, which makes them intelligent systems. Generating images from the text can have vast applications in different fields. Deep learning was incapable of producing high-quality images until recently. In recent times, the introduction of GANs [4] has successfully been used as generative models for synthesising image, text, speech, etc and has also found application in Text-to-Image translation. The current approaches for image synthesis from text other than GANs are not able to generate images with the required satisfactory details corresponding to the descriptions. This paper aims at enabling intelligent systems with the ability of Text-to-Image translation, focused on generating human facial images from textual descriptions with satisfactory details and consistency with the content of the descriptions. The proposed Text-to-Image translation model is capable of understanding different classes of attributes. Physical attributes such as skin tone, hair, eyes as well as the non-physical attributes like expressions, feelings, mood, etc.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chi, P., Nau, D.S.: Predicting the performance of minimax and product in game-tree. CoRR abs/1304.3081 (2013). http://arxiv.org/abs/1304.3081
Denton, E.L., Chintala, S., Szlam, A., Fergus, R.: Deep generative image models using a Laplacian pyramid of adversarial networks. CoRR abs/1506.05751 (2015). http://arxiv.org/abs/1506.05751
Gelbukh, A.: Natural language processing. In: Fifth International Conference on Hybrid Intelligent Systems (HIS’05), p. 1 (2005)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 2672–2680. Curran Associates, Inc. (2014). http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Kingma, D.P., Welling, M.: An introduction to variational autoencoders. CoRR abs/1906.02691 (2019). http://arxiv.org/abs/1906.02691
Mansimov, E., Parisotto, E., Ba, J., Salakhutdinov, R.: Generating images from captions with attention (2016)
Mirza, M., Osindero, S.: Conditional generative adversarial nets. CoRR abs/1411.1784 (2014). http://arxiv.org/abs/1411.1784
Pennington, J., Socher, R., Manning, C.: Glove: Global Vectors for Word Representation, pp. 1532–1543 (2014). https://doi.org/10.3115/v1/D14-1162
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks (2015) (Comment: Under review as a conference paper at ICLR 2016). http://arxiv.org/abs/1511.06434. arxiv:1511.06434
Reed, S.E., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. CoRR abs/1605.05396 (2016). http://arxiv.org/abs/1605.05396
Stuart, A., Voss, J., Wiberg, P.: Conditional path sampling of SDEs and the Langevin MCMC method. Commun. Math. Sci. 2 (2004). https://doi.org/10.4310/CMS.2004.v2.n4.a7
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
Wang, X., Gupta, A.: Generative image modeling using style and structure adversarial networks. CoRR abs/1603.05631 (2016). http://arxiv.org/abs/1603.05631
Xu, K., Li, C., Wei, H., Zhu, J., Zhang, B.: Understanding and stabilizing gans’ training dynamics with control theory (2020). https://openreview.net/forum?id=BJe7h34YDS
Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 97, pp. 7354–7363. PMLR, Long Beach, California, USA (2019). http://proceedings.mlr.press/v97/zhang19d.html
Zhang, H., Xu, T., Li, H., Zhang, S., Huang, X., Wang, X., Metaxas, D.N.: Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. CoRR abs/1612.03242 (2016). http://arxiv.org/abs/1612.03242
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multi-task cascaded convolutional networks. CoRR abs/1604.02878 (2016). http://arxiv.org/abs/1604.02878
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Mukherjee, S., Nath, S.S., Singh, G.K., Banerjee, S. (2022). FACEIFY: Intelligent System for Text to Image Translation. In: Mukherjee, S., Muppalaneni, N.B., Bhattacharya, S., Pradhan, A.K. (eds) Intelligent Systems for Social Good. Advanced Technologies and Societal Change. Springer, Singapore. https://doi.org/10.1007/978-981-19-0770-8_5
Download citation
DOI: https://doi.org/10.1007/978-981-19-0770-8_5
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-0769-2
Online ISBN: 978-981-19-0770-8
eBook Packages: Computer ScienceComputer Science (R0)