ABSTRACT
In this paper, we present Smart Compose, a novel system for generating interactive, real-time suggestions in Gmail that assists users in writing mails by reducing repetitive typing. In the design and deployment of such a large-scale and complicated system, we faced several challenges including model selection, performance evaluation, serving and other practical issues. At the core of Smart Compose is a large-scale neural language model. We leveraged state-of-the-art machine learning techniques for language model training which enabled high-quality suggestion prediction, and constructed novel serving infrastructure for high-throughput and real-time inference. Experimental results show the effectiveness of our proposed system design and deployment approach. This system is currently being served in Gmail.
Supplemental Material
- Cyril Allauzen, Mehryar Mohri, and Brian Roark. 2003. Generalized Algorithms for Constructing Statistical Language Models. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics . Google ScholarDigital Library
- Anton Bakhtin, Arthur Szlam, Marc'Aurelio Ranzato, and Edouard Grave. 2018. Lightweight Adaptive Mixture of Neural and N-gram Language Models. arXiv preprint arXiv:1804.07705 (2018).Google Scholar
- Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A Neural Probabilistic Language Model. Journal of Machine Learning Research (2003). Google ScholarDigital Library
- Google AI Blog. {n. d.}. The Machine Intelligence Behind Gboard. https://ai.googleblog.com/2017/05/the-machine-intelligence-behind-gboard.html .Google Scholar
- Samuel R Bowman, Luke Vilnis, Oriol Vinyals, Andrew M Dai, Rafal Jozefowicz, and Samy Bengio. 2016. Generating sentences from a continuous space. In 20th SIGNLL Conference on Computational Natural Language Learning (CoNLL) .Google ScholarCross Ref
- Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, and Jeffrey Dean. 2007. Large language models in machine translation. In Proceedings of Empirical Methods in Natural Language Processing .Google Scholar
- Aylin Caliskan-Islam, Joanna Bryson, and Arvind Narayanan. 2016. Semantics derived automatically from language corpora necessarily contain human biases. Science , Vol. 356 (08 2016).Google Scholar
- Nicholas Carlini, Chang Liu, Jernej Kos, Ú lfar Erlingsson, and Dawn Song. 2018. The Secret Sharer: Measuring Unintended Neural Network Memorization & Extracting Secrets. CoRR , Vol. abs/1802.08232 (2018). arxiv: 1802.08232 http://arxiv.org/abs/1802.08232Google Scholar
- Jianmin Chen, Rajat Monga, Samy Bengio, and Rafal Jó zefowicz. 2016. Revisiting Distributed Synchronous SGD . CoRR , Vol. abs/1604.00981 (2016). arxiv: 1604.00981 http://arxiv.org/abs/1604.00981Google Scholar
- Mia Xu Chen, Orhan Firat, Ankur Bapna, Melvin Johnson, Wolfgang Macherey, George Foster, Llion Jones, Mike Schuster, Noam Shazeer, Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Zhifeng Chen, Yonghui Wu, and Macduff Hughes. 2018. The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 76--86. http://aclweb.org/anthology/P18--1008Google ScholarCross Ref
- X. Chen, X. Liu, M. J. F. Gales, and P. C. Woodland. 2015. Investigation of back-off based interpolation between recurrent neural network and n-gram language models. In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). 181--186.Google Scholar
- Andrew M. Dai and Quoc V. Le. 2015. Semi-supervised sequence learning. In Advances in Neural Information Processing Systems. 3079--3087. Google ScholarDigital Library
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR , Vol. abs/1810.04805 (2018). arxiv: 1810.04805 http://arxiv.org/abs/1810.04805Google Scholar
- Ahmad Emami, Kishore Papineni, and Jeffrey Sorensen. 2007. Large-Scale Distributed Language Modeling. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07, Vol. 4.Google Scholar
- Orhan Firat, Kyunghyun Cho, and Yoshua Bengio. 2016. Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . Association for Computational Linguistics, 866--875.Google ScholarCross Ref
- Joshua Goodman, Gina Venolia, Keith Steury, and Chauncey Parker. 2002. Language Modeling for Soft Keyboards. In Proceedings of the 7th International Conference on Intelligent User Interfaces. ACM, New York, NY, USA. Google ScholarDigital Library
- Google. 2019 a. Google Cloud TPU . https://cloud.google.com/tpu/. Online; accessed 25 January 2019.Google Scholar
- Google. 2019 b. XLA Compiler . https://cloud.google.com/tpu/docs/system-architecture#xla_compiler . Online; accessed 25 January 2019.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition .Google Scholar
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation , Vol. 9, 8 (1997), 1735--1780. Google ScholarDigital Library
- Bo-June Hsu. 2007. Generalized linear interpolation of language models. In 2007 IEEE Workshop on Automatic Speech Recognition Understanding (ASRU). 136--140.Google Scholar
- Aaron Jaech and Mari Ostendorf. 2018. Personalized Language Model for Query Auto-Completion. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics.Google ScholarCross Ref
- Fred Jelinek. 1997. Statistical Methods for Speech Recognition .MIT Press. Google ScholarDigital Library
- Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2017. Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. Transactions of the Association for Computational Linguistics , Vol. 5 (2017), 339--351.Google ScholarCross Ref
- Norman P. Jouppi, Cliff Young, Nishant Patil, and et al David A. Patterson. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. CoRR , Vol. abs/1704.04760 (2017). arxiv: 1704.04760 http://arxiv.org/abs/1704.04760 Google ScholarDigital Library
- Rafal Jó zefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, and Yonghui Wu. 2016. Exploring the Limits of Language Modeling. CoRR , Vol. abs/1602.02410 (2016). arxiv: 1602.02410 http://arxiv.org/abs/1602.02410Google Scholar
- Daniel Jurafsky and James H. Martin. 2000. Speech and Language Processing .Pearson Education. Google ScholarDigital Library
- Anjuli Kannan, Karol Kurach, Sujith Ravi, Tobias Kaufman, Balint Miklos, Greg Corrado, Andrew Tomkins, Laszlo Lukacs, Marina Ganea, Peter Young, and Vivek Ramavajjala. 2016. Smart Reply: Automated Response Suggestion for Email. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) (2016). Google ScholarDigital Library
- Slava Katz. 1987. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE transactions on acoustics, speech, and signal processing , Vol. 35, 3 (1987), 400--401.Google ScholarCross Ref
- Diederik Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. CoRR , Vol. abs/1312.6114 (2014). arxiv: 1312.6114 http://arxiv.org/abs/1312.6114Google Scholar
- Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR , Vol. abs/1412.6980 (2014). http://arxiv.org/abs/1412.6980Google Scholar
- Philipp Koehn. 2011. Statistical Machine Translation .Cambridge University Press. Google ScholarDigital Library
- Guillaume Lample and Alexis Conneu. 2019. Cross-lingual Language Model Pretraining. CoRR , Vol. abs/1901.07291 (2019). arxiv: 1901.07291Google Scholar
- Hung-yi Lee, Bo-Hsiang Tseng, Tsung-Hsien Wen, and Yu Tsao. 2016. Personalizing Recurrent Neural Network Based Language Model by Social Network. IEEE Transactions on Audio, Speech, and Language Processing , Vol. PP (12 2016), 1--1. Google ScholarDigital Library
- Christopher Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing .MIT Press. Google ScholarDigital Library
- Gá bor Melis, Chris Dyer, and Phil Blunsom. 2017. On the State of the Art of Evaluation in Neural Language Models. CoRR , Vol. abs/1707.05589 (2017). arxiv: 1707.05589 http://arxiv.org/abs/1707.05589Google Scholar
- Tomas Mikolov, Martin Karafiá t, Luká s Burget, Jan Cernocký , and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September 26--30, 2010 .Google ScholarCross Ref
- Mehryar Mohri. {n. d.}. Finite-state Transducers in Language and Speech Processing. Computational Linguistics , Vol. 23, 2 ({n. d.}). Google ScholarDigital Library
- Jakob Nielsen. 1993. Usability Engineering .Academic Press. Google ScholarDigital Library
- Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proc. of NAACL .Google ScholarCross Ref
- S. Radicati. 2018. Email Statistics Report, 2018--2022 . https://www.radicati.com/wp/wp-content/uploads/2018/01/Email_Statistics_Report,_2018--2022_Executive_Summary.pdf . Online; accessed 25 January 2019.Google Scholar
- Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000Google Scholar
- Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing . Association for Computational Linguistics, 2383--2392.Google Scholar
- M. Schuster and K. Nakajima. 2012. Japanese and Korean voice search. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 5149--5152.Google Scholar
- Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, et almbox. 2019. Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling. arxiv: cs.LG/1902.08295Google Scholar
- Ilya Sutskever, Oriol Vinyals, and Quoc Le. 2014. Sequence to Sequence Learning with Neural Networks. In Advances in Neural Information Processing Systems 27 . Google ScholarDigital Library
- Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2015. Rethinking the Inception Architecture for Computer Vision. CoRR , Vol. abs/1512.00567 (2015). arxiv: 1512.00567 http://arxiv.org/abs/1512.00567Google Scholar
- Bo-Hsiang Tseng, Hung-yi Lee, and Lin-Shan Lee. 2015. Personalizing universal recurrent neural network language model with user characteristic features by social network crowdsourcing. In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). 84--91.Google ScholarCross Ref
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In Advances in Neural Information Processing Systems 30 . Google Scholar
- Zhilin Yang, Zihang Dai, Ruslan Salakhutdinov, and William W. Cohen. 2018. Breaking the Softmax Bottleneck: A High-Rank RNN Language Model. In International Conference on Learning Representations .Google Scholar
- Seunghyun Yoon, Hyeongu Yun, Yuna Kim, Gyu-tae Park, and Kyomin Jung. 2017. Efficient Transfer Learning Schemes for Personalized Language Modeling using Recurrent Neural Network. CoRR , Vol. abs/1701.03578 (2017). arxiv: 1701.03578 http://arxiv.org/abs/1701.03578Google Scholar
- Konrad Zolna, Devansh Arpit, Dendi Suhubdy, and Yoshua Bengio. 2018. Fraternal Dropout. In International Conference on Learning Representations .Google Scholar
Index Terms
- Gmail Smart Compose: Real-Time Assisted Writing
Recommendations
Smart furniture: a platform for context-aware embedded ubiquitous applications
EMSOFT '04: Proceedings of the 4th ACM international conference on Embedded softwareIn ubiquitous computing environment, many embedded computers, sensors, devices, and networks are connected for creating context-aware embedded ubiquitous applications. We often build a smart house or a room to demonstrate such context-aware ubiquitous ...
A Walkthrough from Smart Spaces to Smart Hyperspaces towards a Smart World with Ubiquitous Intelligence
ICPADS '05: Proceedings of the 11th International Conference on Parallel and Distributed Systems - Volume 01In the ubicomp/percomp era, more ubiquitous computation/information/service would be the main driving force, and the corresponding essential element would be the various smart/intelligent ubiquitous things or u-things with attached/embedded/blended ...
Systems and technologies for Smart Homes/Smart Phones: A study and comparison
ICEMIS '18: Proceedings of the Fourth International Conference on Engineering & MIS 2018This paper provides a comparative study of the advancements in technologies and devices for Smart Home/Smart phone and it explains the design and implementation procedure of embedded Smart Systems. A smart home is a home that is equipped with technology ...
Comments