Skip to main content

FACEIFY: Intelligent System for Text to Image Translation

  • Chapter
  • First Online:
Intelligent Systems for Social Good

Abstract

Image to Image generation has been a widely explored research topic in recent times. A relatively less explored topic involves multi-modal translation, i.e. Text-to-Image translation. In daily life, people use descriptions for identifying persons. The perception of faces varies from person to person. Different people may focus on different attributes of the face, which may be a non-physical feature too. With the modern computer vision techniques, smart devices could also be empowered to perceive humans and different objects in its surroundings, which makes them intelligent systems. Generating images from the text can have vast applications in different fields. Deep learning was incapable of producing high-quality images until recently. In recent times, the introduction of GANs [4] has successfully been used as generative models for synthesising image, text, speech, etc and has also found application in Text-to-Image translation. The current approaches for image synthesis from text other than GANs are not able to generate images with the required satisfactory details corresponding to the descriptions. This paper aims at enabling intelligent systems with the ability of Text-to-Image translation, focused on generating human facial images from textual descriptions with satisfactory details and consistency with the content of the descriptions. The proposed Text-to-Image translation model is capable of understanding different classes of attributes. Physical attributes such as skin tone, hair, eyes as well as the non-physical attributes like expressions, feelings, mood, etc.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chi, P., Nau, D.S.: Predicting the performance of minimax and product in game-tree. CoRR abs/1304.3081 (2013). http://arxiv.org/abs/1304.3081

  2. Denton, E.L., Chintala, S., Szlam, A., Fergus, R.: Deep generative image models using a Laplacian pyramid of adversarial networks. CoRR abs/1506.05751 (2015). http://arxiv.org/abs/1506.05751

  3. Gelbukh, A.: Natural language processing. In: Fifth International Conference on Hybrid Intelligent Systems (HIS’05), p. 1 (2005)

    Google Scholar 

  4. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 2672–2680. Curran Associates, Inc. (2014). http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf

  5. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  6. Kingma, D.P., Welling, M.: An introduction to variational autoencoders. CoRR abs/1906.02691 (2019). http://arxiv.org/abs/1906.02691

  7. Mansimov, E., Parisotto, E., Ba, J., Salakhutdinov, R.: Generating images from captions with attention (2016)

    Google Scholar 

  8. Mirza, M., Osindero, S.: Conditional generative adversarial nets. CoRR abs/1411.1784 (2014). http://arxiv.org/abs/1411.1784

  9. Pennington, J., Socher, R., Manning, C.: Glove: Global Vectors for Word Representation, pp. 1532–1543 (2014). https://doi.org/10.3115/v1/D14-1162

  10. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks (2015) (Comment: Under review as a conference paper at ICLR 2016). http://arxiv.org/abs/1511.06434. arxiv:1511.06434

  11. Reed, S.E., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. CoRR abs/1605.05396 (2016). http://arxiv.org/abs/1605.05396

  12. Stuart, A., Voss, J., Wiberg, P.: Conditional path sampling of SDEs and the Langevin MCMC method. Commun. Math. Sci. 2 (2004). https://doi.org/10.4310/CMS.2004.v2.n4.a7

  13. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

  14. Wang, X., Gupta, A.: Generative image modeling using style and structure adversarial networks. CoRR abs/1603.05631 (2016). http://arxiv.org/abs/1603.05631

  15. Xu, K., Li, C., Wei, H., Zhu, J., Zhang, B.: Understanding and stabilizing gans’ training dynamics with control theory (2020). https://openreview.net/forum?id=BJe7h34YDS

  16. Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 97, pp. 7354–7363. PMLR, Long Beach, California, USA (2019). http://proceedings.mlr.press/v97/zhang19d.html

  17. Zhang, H., Xu, T., Li, H., Zhang, S., Huang, X., Wang, X., Metaxas, D.N.: Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. CoRR abs/1612.03242 (2016). http://arxiv.org/abs/1612.03242

  18. Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multi-task cascaded convolutional networks. CoRR abs/1604.02878 (2016). http://arxiv.org/abs/1604.02878

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shyamapada Mukherjee .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Mukherjee, S., Nath, S.S., Singh, G.K., Banerjee, S. (2022). FACEIFY: Intelligent System for Text to Image Translation. In: Mukherjee, S., Muppalaneni, N.B., Bhattacharya, S., Pradhan, A.K. (eds) Intelligent Systems for Social Good. Advanced Technologies and Societal Change. Springer, Singapore. https://doi.org/10.1007/978-981-19-0770-8_5

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-0770-8_5

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-0769-2

  • Online ISBN: 978-981-19-0770-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics