research-article

Semantic Text Matching for Long-Form Documents

Authors:
Jyun-Yu Jiang

University of California, Los Angeles, USA

University of California, Los Angeles, USA
View Profile

,
Mingyang Zhang

Google, USA

Google, USA
View Profile

,
Cheng Li

Google, USA

Google, USA
View Profile

,
Michael Bendersky

Google, USA

Google, USA
View Profile

,
Nadav Golbandi

Google, USA

Google, USA
View Profile

,
Marc Najork

Google, USA

Google, USA
View Profile

Authors Info & Claims

WWW '19: The World Wide Web ConferenceMay 2019Pages 795–806https://doi.org/10.1145/3308558.3313707

Published:13 May 2019Publication History

WWW '19: The World Wide Web Conference

Pages 795–806

ABSTRACT

Semantic text matching is one of the most important research problems in many domains, including, but not limited to, information retrieval, question answering, and recommendation. Among the different types of semantic text matching, long-document-to-long-document text matching has many applications, but has rarely been studied. Most existing approaches for semantic text matching have limited success in this setting, due to their inability to capture and distill the main ideas and topics from long-form text.

In this paper, we propose a novel Siamese multi-depth attention-based hierarchical recurrent neural network (SMASH RNN) that learns the long-form semantics, and enables long-form document based semantic text matching. In addition to word information, SMASH RNN is using the document structure to improve the representation of long-form documents. Specifically, SMASH RNN synthesizes information from different document structure levels, including paragraphs, sentences, and words. An attention-based hierarchical RNN derives a representation for each document structure level. Then, the representations learned from the different levels are aggregated to learn a more comprehensive semantic representation of the entire document. For semantic text matching, a Siamese structure couples the representations of a pair of documents, and infers a probabilistic score as their similarity.

We conduct an extensive empirical evaluation of SMASH RNN with three practical applications, including email attachment suggestion, related article recommendation, and citation recommendation. Experimental results on public data sets demonstrate that SMASH RNN significantly outperforms competitive baseline methods across various classification and ranking scenarios in the context of semantic matching of long-form documents.

References

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, 2016. TensorFlow: a system for large-scale machine learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation(OSDI'16). USENIX Association, 265-283. Google ScholarDigital Library
Hadi Amiri, Philip Resnik, Jordan Boyd-Graber, and Hal Daume´ III. 2016. Learning text pair similarity with context-sensitive autoencoders. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics(ACL'16), Vol. 1. ACL, 1882-1892.Google ScholarCross Ref
Ricardo Baeza-Yates, Berthier Ribeiro-Neto, 1999. Modern information retrieval. Vol. 463. ACM. Google ScholarDigital Library
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473(2014).Google Scholar
David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3, Jan (2003), 993-1022. Google ScholarDigital Library
Antoine Bordes, Jason Weston, and Nicolas Usunier. 2014. Open question answering with weakly supervised embedding models. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 165-180.Google ScholarDigital Library
Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30, 1-7 (1998), 107-117. Google ScholarDigital Library
Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard Säckinger, and Roopak Shah. 1994. Signature verification using a” siamese” time delay neural network. In Advances in Neural Information Processing Systems(NIPS'94). 737-744. Google ScholarDigital Library
Michael Busch, Krishna Gade, Brian Larson, Patrick Lok, Samuel Luckenbill, and Jimmy Lin. 2012. Earlybird: Real-time search at twitter. In Proceedings of 2012 IEEE 28th International Conference on Data Engineering(ICDE'12). IEEE, 1360-1369. Google ScholarDigital Library
Huimin Chen, Maosong Sun, Cunchao Tu, Yankai Lin, and Zhiyuan Liu. 2016. Neural sentiment classification with user and product attention. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing(EMNLP'16). 1650-1659.Google ScholarCross Ref
Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078(2014).Google Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805(2018).Google Scholar
Cicero dos Santos and Maira Gatti. 2014. Deep convolutional neural networks for sentiment analysis of short texts. In Proceedings of the 25th International Conference on Computational Linguistics(COLING'14). ACL, 69-78.Google Scholar
Abdessamad Echihabi and Daniel Marcu. 2003. A noisy-channel approach to question answering. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistic(ACL'03). ACL, 16-23. Google ScholarDigital Library
George Forman. 2003. An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research 3, Mar (2003), 1289-1305. Google ScholarDigital Library
Jiafeng Guo, Yixing Fan, Qingyao Ai, and W Bruce Croft. 2016. A deep relevance matching model for ad-hoc retrieval. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management(CIKM'16). ACM, 55-64. Google ScholarDigital Library
Jun Han and Claudio Moraga. 1995. The Influence of the Sigmoid Function Parameters on the Speed of Backpropagation Learning. In Proceedings of the international workshop on artificial neural networks: From natural to artificial neural computation. Springer-Verlag, 195-201. Google ScholarDigital Library
Taher H Haveliwala, Aristides Gionis, Dan Klein, and Piotr Indyk. 2002. Evaluating strategies for similarity search on the web. In Proceedings of the 11th International Conference on World Wide Web(WWW'02). ACM, 432-442. Google ScholarDigital Library
Hua He, Kevin Gimpel, and Jimmy Lin. 2015. Multi-perspective sentence similarity modeling with convolutional neural networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing(EMNLP'15). ACL, 1576-1586.Google ScholarCross Ref
Geoffrey E Hinton and Ruslan R Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (2006), 504-507.Google Scholar
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735-1780. Google ScholarDigital Library
Paul Jaccard. 1912. The distribution of the flora in the alpine zone. 1. New phytologist 11, 2 (1912), 37-50.Google Scholar
Jyun-Yu Jiang, Francine Chen, Yan-Ying Chen, and Wei Wang. 2018. Learning to disentangle interleaved conversational threads with a Siamese hierarchical network and similarity ranking. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies(NAACL-HLT'18). ACL, 1812-1822.Google ScholarCross Ref
Rafal Jozefowicz, Wojciech Zaremba, and Ilya Sutskever. 2015. An empirical exploration of recurrent network architectures. In Proceedings of the 32nd International Conference on Machine Learning(ICML '15). 2342-2350. Google ScholarDigital Library
Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP'14). ACL, 1746-1751.Google ScholarCross Ref
Diederik P Kingma and Jimmy Lei Ba. 2015. Adam: Amethod for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations(ICLR'15).Google Scholar
Bang Liu, Ting Zhang, Fred X Han, Di Niu, Kunfeng Lai, and Yu Xu. 2018. Matching Natural Language Sentences with Hierarchical Sentence Factorization. In Proceedings of the 2018 World Wide Web Conference(WWW'18). ACM, 1237-1246. Google ScholarDigital Library
Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2016. Recurrent neural network for text classification with multi-task learning. In Proceedings of the Twenty-fifth International Joint Conference on Artificial Intelligence(IJCAI'16). AAAI Press, 2873-2879. Google ScholarDigital Library
Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing(EMNLP'15). ACL, 1412-1421.Google ScholarCross Ref
Prem Melville, Wojciech Gryc, and Richard D Lawrence. 2009. Sentiment analysis of blogs by combining lexical knowledge with text classification. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD'09). ACM, 1275-1284. Google ScholarDigital Library
Rada Mihalcea, Courtney Corley, Carlo Strapparava, 2006. Corpus-based and knowledge-based measures of text semantic similarity. In Proceedings of the Twentieth AAAI Conference on Artificial Intelligence(AAAI'06), Vol. 6. AAAI Press, 775-780. Google ScholarDigital Library
Jonas Mueller and Aditya Thyagarajan. 2016. Siamese recurrent architectures for learning sentence similarity. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence(AAAI'16). AAAI Press, 2786-2792. Google ScholarDigital Library
Paul Neculoiu, Maarten Versteegh, and Mihai Rotaru. 2016. Learning text similarity with siamese recurrent networks. In Proceedings of the 1st Workshop on Representation Learning for NLP. 148-157.Google ScholarCross Ref
Alexandru Niculescu-Mizil and Rich Caruana. 2005. Predicting good probabilities with supervised learning. In Proceedings of the 22nd International Conference on Machine Learning(ICML'05). 625-632. Google ScholarDigital Library
Geoffrey Nunberg. 1990. The linguistics of punctuation. Number 18. Center for the Study of Language (CSLI).Google Scholar
Douglas Oard, William Webber, David Kirsch, and Sergey Golitsynskiy. 2015. Avocado research email collection. Philadelphia: Linguistic Data Consortium(2015).Google Scholar
Dragomir R. Radev, Pradeep Muthukrishnan, Vahed Qazvinian, and Amjad Abu-Jbara. 2013. The ACL anthology network corpus. Language Resources and Evaluation(2013), 1-26. Google ScholarDigital Library
Aliaksei Severyn and Alessandro Moschitti. 2015. Learning to rank short text pairs with convolutional deep neural networks. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR'15). ACM, 373-382. Google ScholarDigital Library
Aliaksei Severyn and Alessandro Moschitti. 2015. Twitter sentiment analysis with deep convolutional neural networks. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR'15). ACM, 959-962. Google ScholarDigital Library
Sainbayar Sukhbaatar, Jason Weston, Rob Fergus, 2015. End-to-end memory networks. In Advances in Neural Information Processing Systems(NIPS'15). 2440-2448. Google ScholarDigital Library
Jaime Teevan, Daniel Ramage, and Merredith Ringel Morris. 2011. # TwitterSearch: a comparison of microblog search and web search. In Proceedings of the Fourth ACM International Conference on Web search and Data Mining(WSDM'11). ACM, 35-44. Google ScholarDigital Library
George Tsatsaronis, Iraklis Varlamis, and Michalis Vazirgiannis. 2010. Text relatedness based on a word thesaurus. Journal of Artificial Intelligence Research 37 (2010), 1-39. Google ScholarDigital Library
Christophe Van Gysel, Bhaskar Mitra, Matteo Venanzi, Roy Rosemarin, Grzegorz Kukla, Piotr Grudzien, and Nicola Cancedda. 2017. Reply with: Proactive recommendation of email attachments. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management(CIKM'17). ACM, 327-336. Google ScholarDigital Library
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems(NIPS'17). 5998-6008. Google ScholarDigital Library
Shengxian Wan, Yanyan Lan, Jiafeng Guo, Jun Xu, Liang Pang, and Xueqi Cheng. 2016. A Deep Architecture for Semantic Matching with Multiple Positional Sentence Representations.. In AAAI, Vol. 16. AAAI Press, 2835-2841. Google ScholarDigital Library
Chenglong Wang, Feijun Jiang, and Hongxia Yang. 2017. A hybrid framework for text modeling with convolutional RNN. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD'17). ACM, 2061-2069. Google ScholarDigital Library
Shuohang Wang and Jing Jiang. 2017. A compare-aggregate model for matching text sequences. (2017).Google Scholar
Wikipedia. 2001. Wikipedia, The Free Encyclopedia. http://en.wikipedia.org/Google Scholar
Ho Chung Wu, Robert Wing Pong Luk, Kam Fai Wong, and Kui Lam Kwok. 2008. Interpreting tf-idf term weights as making relevance decisions. ACM Transactions on Information Systems 26, 3 (2008), 13. Google ScholarDigital Library
Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power. 2017. End-to-end neural ad-hoc ranking with kernel pooling. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 55-64. Google ScholarDigital Library
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning(ICML'15). 2048-2057. Google ScholarDigital Library
Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies(NAACL-HLT'16). ACL, 1480-1489.Google ScholarCross Ref
Wen-tau Yih, Kristina Toutanova, John C Platt, and Christopher Meek. 2011. Learning discriminative projections for text similarity measures. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning(CoNLL'11). ACL, 247-256. Google ScholarDigital Library
Wenpeng Yin, Hinrich Schütze, Bing Xiang, and Bowen Zhou. 2016. Abcnn: Attention-based convolutional neural network for modeling sentence pairs. Transactions of the Association for Computational Linguistics 4 (2016), 259-272.Google ScholarCross Ref
Xiang Zhang and Yann LeCun. 2015. Text understanding from scratch. arXiv preprint arXiv:1502.01710(2015).Google Scholar

Recommendations

Supervised Contrastive Learning for Interpretable Long-Form Document Matching
Recent advancements in deep learning techniques have transformed the area of semantic text matching (STM). However, most state-of-the-art models are designed to operate with short documents such as tweets, user reviews, comments, and so on. These models ...
Read More
Feature Differentiation and Fusion for Semantic Text Matching
Advances in Information Retrieval
Abstract
Semantic Text Matching (STM for short) stands for the task of automatically determining the semantic similarity for a pair of texts. It has been widely applied in a variety of downstream tasks, e.g., information retrieval and question answering. ...
Read More
Semantic Annotating of Text Documents: Basic Concepts and Taxonomic Approach

One of the tools for the semantic enrichment of the content of information resources is semantic annotating, which makes it possible to comment on and evaluate annotated resources and their fragments and to carry out a semantic search on their basis. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '19: The World Wide Web Conference
May 2019
3620 pages
ISBN:9781450366748
DOI:10.1145/3308558
Editors:
Ling Liu
Georgia Tech, USA
,
Ryen White
Microsoft Research, USA
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 May 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Semantic text matching
attention mechanism
hierarchical document structures
long documents
recurrent neural networks.
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 37
  Total Citations
  View Citations
- 1,100
  Total Downloads
- Downloads (Last 12 months)118
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Semantic Text Matching for Long-Form Documents

WWW '19: The World Wide Web Conference

ABSTRACT

References

Cited By

Recommendations

Supervised Contrastive Learning for Interpretable Long-Form Document Matching

Feature Differentiation and Fusion for Semantic Text Matching

Semantic Annotating of Text Documents: Basic Concepts and Taxonomic Approach

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Semantic Text Matching for Long-Form Documents

WWW '19: The World Wide Web Conference

ABSTRACT

References

Cited By

Recommendations

Supervised Contrastive Learning for Interpretable Long-Form Document Matching

Feature Differentiation and Fusion for Semantic Text Matching

Semantic Annotating of Text Documents: Basic Concepts and Taxonomic Approach

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media