FACEIFY: Intelligent System for Text to Image Translation

Mukherjee, Shyamapada; Nath, Satya Sanjay; Singh, Gobind Kumar; Banerjee, Sumanta

doi:10.1007/978-981-19-0770-8_5

Shyamapada Mukherjee¹²,
Satya Sanjay Nath¹²,
Gobind Kumar Singh¹² &
…
Sumanta Banerjee¹²

Part of the book series: Advanced Technologies and Societal Change ((ATSC))

329 Accesses
3 Citations

Abstract

Image to Image generation has been a widely explored research topic in recent times. A relatively less explored topic involves multi-modal translation, i.e. Text-to-Image translation. In daily life, people use descriptions for identifying persons. The perception of faces varies from person to person. Different people may focus on different attributes of the face, which may be a non-physical feature too. With the modern computer vision techniques, smart devices could also be empowered to perceive humans and different objects in its surroundings, which makes them intelligent systems. Generating images from the text can have vast applications in different fields. Deep learning was incapable of producing high-quality images until recently. In recent times, the introduction of GANs [4] has successfully been used as generative models for synthesising image, text, speech, etc and has also found application in Text-to-Image translation. The current approaches for image synthesis from text other than GANs are not able to generate images with the required satisfactory details corresponding to the descriptions. This paper aims at enabling intelligent systems with the ability of Text-to-Image translation, focused on generating human facial images from textual descriptions with satisfactory details and consistency with the content of the descriptions. The proposed Text-to-Image translation model is capable of understanding different classes of attributes. Physical attributes such as skin tone, hair, eyes as well as the non-physical attributes like expressions, feelings, mood, etc.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chi, P., Nau, D.S.: Predicting the performance of minimax and product in game-tree. CoRR abs/1304.3081 (2013). http://arxiv.org/abs/1304.3081
Denton, E.L., Chintala, S., Szlam, A., Fergus, R.: Deep generative image models using a Laplacian pyramid of adversarial networks. CoRR abs/1506.05751 (2015). http://arxiv.org/abs/1506.05751
Gelbukh, A.: Natural language processing. In: Fifth International Conference on Hybrid Intelligent Systems (HIS’05), p. 1 (2005)
Google Scholar
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 2672–2680. Curran Associates, Inc. (2014). http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Kingma, D.P., Welling, M.: An introduction to variational autoencoders. CoRR abs/1906.02691 (2019). http://arxiv.org/abs/1906.02691
Mansimov, E., Parisotto, E., Ba, J., Salakhutdinov, R.: Generating images from captions with attention (2016)
Google Scholar
Mirza, M., Osindero, S.: Conditional generative adversarial nets. CoRR abs/1411.1784 (2014). http://arxiv.org/abs/1411.1784
Pennington, J., Socher, R., Manning, C.: Glove: Global Vectors for Word Representation, pp. 1532–1543 (2014). https://doi.org/10.3115/v1/D14-1162
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks (2015) (Comment: Under review as a conference paper at ICLR 2016). http://arxiv.org/abs/1511.06434. arxiv:1511.06434
Reed, S.E., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. CoRR abs/1605.05396 (2016). http://arxiv.org/abs/1605.05396
Stuart, A., Voss, J., Wiberg, P.: Conditional path sampling of SDEs and the Langevin MCMC method. Commun. Math. Sci. 2 (2004). https://doi.org/10.4310/CMS.2004.v2.n4.a7
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
Wang, X., Gupta, A.: Generative image modeling using style and structure adversarial networks. CoRR abs/1603.05631 (2016). http://arxiv.org/abs/1603.05631
Xu, K., Li, C., Wei, H., Zhu, J., Zhang, B.: Understanding and stabilizing gans’ training dynamics with control theory (2020). https://openreview.net/forum?id=BJe7h34YDS
Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 97, pp. 7354–7363. PMLR, Long Beach, California, USA (2019). http://proceedings.mlr.press/v97/zhang19d.html
Zhang, H., Xu, T., Li, H., Zhang, S., Huang, X., Wang, X., Metaxas, D.N.: Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. CoRR abs/1612.03242 (2016). http://arxiv.org/abs/1612.03242
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multi-task cascaded convolutional networks. CoRR abs/1604.02878 (2016). http://arxiv.org/abs/1604.02878

Download references

Author information

Authors and Affiliations

National Institute of Technology Silchar, Silchar, 788010, Assam, India
Shyamapada Mukherjee, Satya Sanjay Nath, Gobind Kumar Singh & Sumanta Banerjee

Authors

Shyamapada Mukherjee
View author publications
You can also search for this author in PubMed Google Scholar
Satya Sanjay Nath
View author publications
You can also search for this author in PubMed Google Scholar
Gobind Kumar Singh
View author publications
You can also search for this author in PubMed Google Scholar
Sumanta Banerjee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shyamapada Mukherjee .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology Silchar, Silchar, Assam, India
Shyamapada Mukherjee
Department of Computer Science and Engineering, National Institute of Technology Silchar, Silchar, Assam, India
Naresh Babu Muppalaneni
Luxembourg Institute of Science and Technology, Belvaux, Luxembourg
Sukriti Bhattacharya
Department of Computer Science Engineering, SRM University, Guntur, Andhra Pradesh, India
Ashok Kumar Pradhan

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mukherjee, S., Nath, S.S., Singh, G.K., Banerjee, S. (2022). FACEIFY: Intelligent System for Text to Image Translation. In: Mukherjee, S., Muppalaneni, N.B., Bhattacharya, S., Pradhan, A.K. (eds) Intelligent Systems for Social Good. Advanced Technologies and Societal Change. Springer, Singapore. https://doi.org/10.1007/978-981-19-0770-8_5

Download citation

DOI: https://doi.org/10.1007/978-981-19-0770-8_5
Published: 11 June 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-0769-2
Online ISBN: 978-981-19-0770-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics