ABSTRACT
Semantic text matching is one of the most important research problems in many domains, including, but not limited to, information retrieval, question answering, and recommendation. Among the different types of semantic text matching, long-document-to-long-document text matching has many applications, but has rarely been studied. Most existing approaches for semantic text matching have limited success in this setting, due to their inability to capture and distill the main ideas and topics from long-form text.
In this paper, we propose a novel Siamese multi-depth attention-based hierarchical recurrent neural network (SMASH RNN) that learns the long-form semantics, and enables long-form document based semantic text matching. In addition to word information, SMASH RNN is using the document structure to improve the representation of long-form documents. Specifically, SMASH RNN synthesizes information from different document structure levels, including paragraphs, sentences, and words. An attention-based hierarchical RNN derives a representation for each document structure level. Then, the representations learned from the different levels are aggregated to learn a more comprehensive semantic representation of the entire document. For semantic text matching, a Siamese structure couples the representations of a pair of documents, and infers a probabilistic score as their similarity.
We conduct an extensive empirical evaluation of SMASH RNN with three practical applications, including email attachment suggestion, related article recommendation, and citation recommendation. Experimental results on public data sets demonstrate that SMASH RNN significantly outperforms competitive baseline methods across various classification and ranking scenarios in the context of semantic matching of long-form documents.
- Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, 2016. TensorFlow: a system for large-scale machine learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation(OSDI'16). USENIX Association, 265-283. Google ScholarDigital Library
- Hadi Amiri, Philip Resnik, Jordan Boyd-Graber, and Hal Daume´ III. 2016. Learning text pair similarity with context-sensitive autoencoders. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics(ACL'16), Vol. 1. ACL, 1882-1892.Google ScholarCross Ref
- Ricardo Baeza-Yates, Berthier Ribeiro-Neto, 1999. Modern information retrieval. Vol. 463. ACM. Google ScholarDigital Library
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473(2014).Google Scholar
- David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3, Jan (2003), 993-1022. Google ScholarDigital Library
- Antoine Bordes, Jason Weston, and Nicolas Usunier. 2014. Open question answering with weakly supervised embedding models. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 165-180.Google ScholarDigital Library
- Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30, 1-7 (1998), 107-117. Google ScholarDigital Library
- Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard Säckinger, and Roopak Shah. 1994. Signature verification using a” siamese” time delay neural network. In Advances in Neural Information Processing Systems(NIPS'94). 737-744. Google ScholarDigital Library
- Michael Busch, Krishna Gade, Brian Larson, Patrick Lok, Samuel Luckenbill, and Jimmy Lin. 2012. Earlybird: Real-time search at twitter. In Proceedings of 2012 IEEE 28th International Conference on Data Engineering(ICDE'12). IEEE, 1360-1369. Google ScholarDigital Library
- Huimin Chen, Maosong Sun, Cunchao Tu, Yankai Lin, and Zhiyuan Liu. 2016. Neural sentiment classification with user and product attention. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing(EMNLP'16). 1650-1659.Google ScholarCross Ref
- Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078(2014).Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805(2018).Google Scholar
- Cicero dos Santos and Maira Gatti. 2014. Deep convolutional neural networks for sentiment analysis of short texts. In Proceedings of the 25th International Conference on Computational Linguistics(COLING'14). ACL, 69-78.Google Scholar
- Abdessamad Echihabi and Daniel Marcu. 2003. A noisy-channel approach to question answering. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistic(ACL'03). ACL, 16-23. Google ScholarDigital Library
- George Forman. 2003. An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research 3, Mar (2003), 1289-1305. Google ScholarDigital Library
- Jiafeng Guo, Yixing Fan, Qingyao Ai, and W Bruce Croft. 2016. A deep relevance matching model for ad-hoc retrieval. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management(CIKM'16). ACM, 55-64. Google ScholarDigital Library
- Jun Han and Claudio Moraga. 1995. The Influence of the Sigmoid Function Parameters on the Speed of Backpropagation Learning. In Proceedings of the international workshop on artificial neural networks: From natural to artificial neural computation. Springer-Verlag, 195-201. Google ScholarDigital Library
- Taher H Haveliwala, Aristides Gionis, Dan Klein, and Piotr Indyk. 2002. Evaluating strategies for similarity search on the web. In Proceedings of the 11th International Conference on World Wide Web(WWW'02). ACM, 432-442. Google ScholarDigital Library
- Hua He, Kevin Gimpel, and Jimmy Lin. 2015. Multi-perspective sentence similarity modeling with convolutional neural networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing(EMNLP'15). ACL, 1576-1586.Google ScholarCross Ref
- Geoffrey E Hinton and Ruslan R Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (2006), 504-507.Google Scholar
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735-1780. Google ScholarDigital Library
- Paul Jaccard. 1912. The distribution of the flora in the alpine zone. 1. New phytologist 11, 2 (1912), 37-50.Google Scholar
- Jyun-Yu Jiang, Francine Chen, Yan-Ying Chen, and Wei Wang. 2018. Learning to disentangle interleaved conversational threads with a Siamese hierarchical network and similarity ranking. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies(NAACL-HLT'18). ACL, 1812-1822.Google ScholarCross Ref
- Rafal Jozefowicz, Wojciech Zaremba, and Ilya Sutskever. 2015. An empirical exploration of recurrent network architectures. In Proceedings of the 32nd International Conference on Machine Learning(ICML '15). 2342-2350. Google ScholarDigital Library
- Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP'14). ACL, 1746-1751.Google ScholarCross Ref
- Diederik P Kingma and Jimmy Lei Ba. 2015. Adam: Amethod for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations(ICLR'15).Google Scholar
- Bang Liu, Ting Zhang, Fred X Han, Di Niu, Kunfeng Lai, and Yu Xu. 2018. Matching Natural Language Sentences with Hierarchical Sentence Factorization. In Proceedings of the 2018 World Wide Web Conference(WWW'18). ACM, 1237-1246. Google ScholarDigital Library
- Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2016. Recurrent neural network for text classification with multi-task learning. In Proceedings of the Twenty-fifth International Joint Conference on Artificial Intelligence(IJCAI'16). AAAI Press, 2873-2879. Google ScholarDigital Library
- Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing(EMNLP'15). ACL, 1412-1421.Google ScholarCross Ref
- Prem Melville, Wojciech Gryc, and Richard D Lawrence. 2009. Sentiment analysis of blogs by combining lexical knowledge with text classification. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD'09). ACM, 1275-1284. Google ScholarDigital Library
- Rada Mihalcea, Courtney Corley, Carlo Strapparava, 2006. Corpus-based and knowledge-based measures of text semantic similarity. In Proceedings of the Twentieth AAAI Conference on Artificial Intelligence(AAAI'06), Vol. 6. AAAI Press, 775-780. Google ScholarDigital Library
- Jonas Mueller and Aditya Thyagarajan. 2016. Siamese recurrent architectures for learning sentence similarity. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence(AAAI'16). AAAI Press, 2786-2792. Google ScholarDigital Library
- Paul Neculoiu, Maarten Versteegh, and Mihai Rotaru. 2016. Learning text similarity with siamese recurrent networks. In Proceedings of the 1st Workshop on Representation Learning for NLP. 148-157.Google ScholarCross Ref
- Alexandru Niculescu-Mizil and Rich Caruana. 2005. Predicting good probabilities with supervised learning. In Proceedings of the 22nd International Conference on Machine Learning(ICML'05). 625-632. Google ScholarDigital Library
- Geoffrey Nunberg. 1990. The linguistics of punctuation. Number 18. Center for the Study of Language (CSLI).Google Scholar
- Douglas Oard, William Webber, David Kirsch, and Sergey Golitsynskiy. 2015. Avocado research email collection. Philadelphia: Linguistic Data Consortium(2015).Google Scholar
- Dragomir R. Radev, Pradeep Muthukrishnan, Vahed Qazvinian, and Amjad Abu-Jbara. 2013. The ACL anthology network corpus. Language Resources and Evaluation(2013), 1-26. Google ScholarDigital Library
- Aliaksei Severyn and Alessandro Moschitti. 2015. Learning to rank short text pairs with convolutional deep neural networks. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR'15). ACM, 373-382. Google ScholarDigital Library
- Aliaksei Severyn and Alessandro Moschitti. 2015. Twitter sentiment analysis with deep convolutional neural networks. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR'15). ACM, 959-962. Google ScholarDigital Library
- Sainbayar Sukhbaatar, Jason Weston, Rob Fergus, 2015. End-to-end memory networks. In Advances in Neural Information Processing Systems(NIPS'15). 2440-2448. Google ScholarDigital Library
- Jaime Teevan, Daniel Ramage, and Merredith Ringel Morris. 2011. # TwitterSearch: a comparison of microblog search and web search. In Proceedings of the Fourth ACM International Conference on Web search and Data Mining(WSDM'11). ACM, 35-44. Google ScholarDigital Library
- George Tsatsaronis, Iraklis Varlamis, and Michalis Vazirgiannis. 2010. Text relatedness based on a word thesaurus. Journal of Artificial Intelligence Research 37 (2010), 1-39. Google ScholarDigital Library
- Christophe Van Gysel, Bhaskar Mitra, Matteo Venanzi, Roy Rosemarin, Grzegorz Kukla, Piotr Grudzien, and Nicola Cancedda. 2017. Reply with: Proactive recommendation of email attachments. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management(CIKM'17). ACM, 327-336. Google ScholarDigital Library
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems(NIPS'17). 5998-6008. Google ScholarDigital Library
- Shengxian Wan, Yanyan Lan, Jiafeng Guo, Jun Xu, Liang Pang, and Xueqi Cheng. 2016. A Deep Architecture for Semantic Matching with Multiple Positional Sentence Representations.. In AAAI, Vol. 16. AAAI Press, 2835-2841. Google ScholarDigital Library
- Chenglong Wang, Feijun Jiang, and Hongxia Yang. 2017. A hybrid framework for text modeling with convolutional RNN. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD'17). ACM, 2061-2069. Google ScholarDigital Library
- Shuohang Wang and Jing Jiang. 2017. A compare-aggregate model for matching text sequences. (2017).Google Scholar
- Wikipedia. 2001. Wikipedia, The Free Encyclopedia. http://en.wikipedia.org/Google Scholar
- Ho Chung Wu, Robert Wing Pong Luk, Kam Fai Wong, and Kui Lam Kwok. 2008. Interpreting tf-idf term weights as making relevance decisions. ACM Transactions on Information Systems 26, 3 (2008), 13. Google ScholarDigital Library
- Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power. 2017. End-to-end neural ad-hoc ranking with kernel pooling. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 55-64. Google ScholarDigital Library
- Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning(ICML'15). 2048-2057. Google ScholarDigital Library
- Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies(NAACL-HLT'16). ACL, 1480-1489.Google ScholarCross Ref
- Wen-tau Yih, Kristina Toutanova, John C Platt, and Christopher Meek. 2011. Learning discriminative projections for text similarity measures. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning(CoNLL'11). ACL, 247-256. Google ScholarDigital Library
- Wenpeng Yin, Hinrich Schütze, Bing Xiang, and Bowen Zhou. 2016. Abcnn: Attention-based convolutional neural network for modeling sentence pairs. Transactions of the Association for Computational Linguistics 4 (2016), 259-272.Google ScholarCross Ref
- Xiang Zhang and Yann LeCun. 2015. Text understanding from scratch. arXiv preprint arXiv:1502.01710(2015).Google Scholar
Recommendations
Supervised Contrastive Learning for Interpretable Long-Form Document Matching
Recent advancements in deep learning techniques have transformed the area of semantic text matching (STM). However, most state-of-the-art models are designed to operate with short documents such as tweets, user reviews, comments, and so on. These models ...
Feature Differentiation and Fusion for Semantic Text Matching
Advances in Information RetrievalAbstractSemantic Text Matching (STM for short) stands for the task of automatically determining the semantic similarity for a pair of texts. It has been widely applied in a variety of downstream tasks, e.g., information retrieval and question answering. ...
Semantic Annotating of Text Documents: Basic Concepts and Taxonomic Approach
One of the tools for the semantic enrichment of the content of information resources is semantic annotating, which makes it possible to comment on and evaluate annotated resources and their fragments and to carry out a semantic search on their basis. ...
Comments