skip to main content
10.1145/3308558.3313707acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Semantic Text Matching for Long-Form Documents

Published:13 May 2019Publication History

ABSTRACT

Semantic text matching is one of the most important research problems in many domains, including, but not limited to, information retrieval, question answering, and recommendation. Among the different types of semantic text matching, long-document-to-long-document text matching has many applications, but has rarely been studied. Most existing approaches for semantic text matching have limited success in this setting, due to their inability to capture and distill the main ideas and topics from long-form text.

In this paper, we propose a novel Siamese multi-depth attention-based hierarchical recurrent neural network (SMASH RNN) that learns the long-form semantics, and enables long-form document based semantic text matching. In addition to word information, SMASH RNN is using the document structure to improve the representation of long-form documents. Specifically, SMASH RNN synthesizes information from different document structure levels, including paragraphs, sentences, and words. An attention-based hierarchical RNN derives a representation for each document structure level. Then, the representations learned from the different levels are aggregated to learn a more comprehensive semantic representation of the entire document. For semantic text matching, a Siamese structure couples the representations of a pair of documents, and infers a probabilistic score as their similarity.

We conduct an extensive empirical evaluation of SMASH RNN with three practical applications, including email attachment suggestion, related article recommendation, and citation recommendation. Experimental results on public data sets demonstrate that SMASH RNN significantly outperforms competitive baseline methods across various classification and ranking scenarios in the context of semantic matching of long-form documents.

References

  1. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, 2016. TensorFlow: a system for large-scale machine learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation(OSDI'16). USENIX Association, 265-283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Hadi Amiri, Philip Resnik, Jordan Boyd-Graber, and Hal Daume´ III. 2016. Learning text pair similarity with context-sensitive autoencoders. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics(ACL'16), Vol. 1. ACL, 1882-1892.Google ScholarGoogle ScholarCross RefCross Ref
  3. Ricardo Baeza-Yates, Berthier Ribeiro-Neto, 1999. Modern information retrieval. Vol. 463. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473(2014).Google ScholarGoogle Scholar
  5. David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3, Jan (2003), 993-1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Antoine Bordes, Jason Weston, and Nicolas Usunier. 2014. Open question answering with weakly supervised embedding models. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 165-180.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30, 1-7 (1998), 107-117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard Säckinger, and Roopak Shah. 1994. Signature verification using a” siamese” time delay neural network. In Advances in Neural Information Processing Systems(NIPS'94). 737-744. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Michael Busch, Krishna Gade, Brian Larson, Patrick Lok, Samuel Luckenbill, and Jimmy Lin. 2012. Earlybird: Real-time search at twitter. In Proceedings of 2012 IEEE 28th International Conference on Data Engineering(ICDE'12). IEEE, 1360-1369. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Huimin Chen, Maosong Sun, Cunchao Tu, Yankai Lin, and Zhiyuan Liu. 2016. Neural sentiment classification with user and product attention. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing(EMNLP'16). 1650-1659.Google ScholarGoogle ScholarCross RefCross Ref
  11. Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078(2014).Google ScholarGoogle Scholar
  12. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805(2018).Google ScholarGoogle Scholar
  13. Cicero dos Santos and Maira Gatti. 2014. Deep convolutional neural networks for sentiment analysis of short texts. In Proceedings of the 25th International Conference on Computational Linguistics(COLING'14). ACL, 69-78.Google ScholarGoogle Scholar
  14. Abdessamad Echihabi and Daniel Marcu. 2003. A noisy-channel approach to question answering. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistic(ACL'03). ACL, 16-23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. George Forman. 2003. An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research 3, Mar (2003), 1289-1305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jiafeng Guo, Yixing Fan, Qingyao Ai, and W Bruce Croft. 2016. A deep relevance matching model for ad-hoc retrieval. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management(CIKM'16). ACM, 55-64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jun Han and Claudio Moraga. 1995. The Influence of the Sigmoid Function Parameters on the Speed of Backpropagation Learning. In Proceedings of the international workshop on artificial neural networks: From natural to artificial neural computation. Springer-Verlag, 195-201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Taher H Haveliwala, Aristides Gionis, Dan Klein, and Piotr Indyk. 2002. Evaluating strategies for similarity search on the web. In Proceedings of the 11th International Conference on World Wide Web(WWW'02). ACM, 432-442. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hua He, Kevin Gimpel, and Jimmy Lin. 2015. Multi-perspective sentence similarity modeling with convolutional neural networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing(EMNLP'15). ACL, 1576-1586.Google ScholarGoogle ScholarCross RefCross Ref
  20. Geoffrey E Hinton and Ruslan R Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (2006), 504-507.Google ScholarGoogle Scholar
  21. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735-1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Paul Jaccard. 1912. The distribution of the flora in the alpine zone. 1. New phytologist 11, 2 (1912), 37-50.Google ScholarGoogle Scholar
  23. Jyun-Yu Jiang, Francine Chen, Yan-Ying Chen, and Wei Wang. 2018. Learning to disentangle interleaved conversational threads with a Siamese hierarchical network and similarity ranking. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies(NAACL-HLT'18). ACL, 1812-1822.Google ScholarGoogle ScholarCross RefCross Ref
  24. Rafal Jozefowicz, Wojciech Zaremba, and Ilya Sutskever. 2015. An empirical exploration of recurrent network architectures. In Proceedings of the 32nd International Conference on Machine Learning(ICML '15). 2342-2350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP'14). ACL, 1746-1751.Google ScholarGoogle ScholarCross RefCross Ref
  26. Diederik P Kingma and Jimmy Lei Ba. 2015. Adam: Amethod for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations(ICLR'15).Google ScholarGoogle Scholar
  27. Bang Liu, Ting Zhang, Fred X Han, Di Niu, Kunfeng Lai, and Yu Xu. 2018. Matching Natural Language Sentences with Hierarchical Sentence Factorization. In Proceedings of the 2018 World Wide Web Conference(WWW'18). ACM, 1237-1246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2016. Recurrent neural network for text classification with multi-task learning. In Proceedings of the Twenty-fifth International Joint Conference on Artificial Intelligence(IJCAI'16). AAAI Press, 2873-2879. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing(EMNLP'15). ACL, 1412-1421.Google ScholarGoogle ScholarCross RefCross Ref
  30. Prem Melville, Wojciech Gryc, and Richard D Lawrence. 2009. Sentiment analysis of blogs by combining lexical knowledge with text classification. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD'09). ACM, 1275-1284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Rada Mihalcea, Courtney Corley, Carlo Strapparava, 2006. Corpus-based and knowledge-based measures of text semantic similarity. In Proceedings of the Twentieth AAAI Conference on Artificial Intelligence(AAAI'06), Vol. 6. AAAI Press, 775-780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Jonas Mueller and Aditya Thyagarajan. 2016. Siamese recurrent architectures for learning sentence similarity. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence(AAAI'16). AAAI Press, 2786-2792. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Paul Neculoiu, Maarten Versteegh, and Mihai Rotaru. 2016. Learning text similarity with siamese recurrent networks. In Proceedings of the 1st Workshop on Representation Learning for NLP. 148-157.Google ScholarGoogle ScholarCross RefCross Ref
  34. Alexandru Niculescu-Mizil and Rich Caruana. 2005. Predicting good probabilities with supervised learning. In Proceedings of the 22nd International Conference on Machine Learning(ICML'05). 625-632. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Geoffrey Nunberg. 1990. The linguistics of punctuation. Number 18. Center for the Study of Language (CSLI).Google ScholarGoogle Scholar
  36. Douglas Oard, William Webber, David Kirsch, and Sergey Golitsynskiy. 2015. Avocado research email collection. Philadelphia: Linguistic Data Consortium(2015).Google ScholarGoogle Scholar
  37. Dragomir R. Radev, Pradeep Muthukrishnan, Vahed Qazvinian, and Amjad Abu-Jbara. 2013. The ACL anthology network corpus. Language Resources and Evaluation(2013), 1-26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Aliaksei Severyn and Alessandro Moschitti. 2015. Learning to rank short text pairs with convolutional deep neural networks. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR'15). ACM, 373-382. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Aliaksei Severyn and Alessandro Moschitti. 2015. Twitter sentiment analysis with deep convolutional neural networks. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR'15). ACM, 959-962. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Sainbayar Sukhbaatar, Jason Weston, Rob Fergus, 2015. End-to-end memory networks. In Advances in Neural Information Processing Systems(NIPS'15). 2440-2448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Jaime Teevan, Daniel Ramage, and Merredith Ringel Morris. 2011. # TwitterSearch: a comparison of microblog search and web search. In Proceedings of the Fourth ACM International Conference on Web search and Data Mining(WSDM'11). ACM, 35-44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. George Tsatsaronis, Iraklis Varlamis, and Michalis Vazirgiannis. 2010. Text relatedness based on a word thesaurus. Journal of Artificial Intelligence Research 37 (2010), 1-39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Christophe Van Gysel, Bhaskar Mitra, Matteo Venanzi, Roy Rosemarin, Grzegorz Kukla, Piotr Grudzien, and Nicola Cancedda. 2017. Reply with: Proactive recommendation of email attachments. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management(CIKM'17). ACM, 327-336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems(NIPS'17). 5998-6008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Shengxian Wan, Yanyan Lan, Jiafeng Guo, Jun Xu, Liang Pang, and Xueqi Cheng. 2016. A Deep Architecture for Semantic Matching with Multiple Positional Sentence Representations.. In AAAI, Vol. 16. AAAI Press, 2835-2841. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Chenglong Wang, Feijun Jiang, and Hongxia Yang. 2017. A hybrid framework for text modeling with convolutional RNN. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD'17). ACM, 2061-2069. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Shuohang Wang and Jing Jiang. 2017. A compare-aggregate model for matching text sequences. (2017).Google ScholarGoogle Scholar
  48. Wikipedia. 2001. Wikipedia, The Free Encyclopedia. http://en.wikipedia.org/Google ScholarGoogle Scholar
  49. Ho Chung Wu, Robert Wing Pong Luk, Kam Fai Wong, and Kui Lam Kwok. 2008. Interpreting tf-idf term weights as making relevance decisions. ACM Transactions on Information Systems 26, 3 (2008), 13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power. 2017. End-to-end neural ad-hoc ranking with kernel pooling. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 55-64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning(ICML'15). 2048-2057. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies(NAACL-HLT'16). ACL, 1480-1489.Google ScholarGoogle ScholarCross RefCross Ref
  53. Wen-tau Yih, Kristina Toutanova, John C Platt, and Christopher Meek. 2011. Learning discriminative projections for text similarity measures. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning(CoNLL'11). ACL, 247-256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Wenpeng Yin, Hinrich Schütze, Bing Xiang, and Bowen Zhou. 2016. Abcnn: Attention-based convolutional neural network for modeling sentence pairs. Transactions of the Association for Computational Linguistics 4 (2016), 259-272.Google ScholarGoogle ScholarCross RefCross Ref
  55. Xiang Zhang and Yann LeCun. 2015. Text understanding from scratch. arXiv preprint arXiv:1502.01710(2015).Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    WWW '19: The World Wide Web Conference
    May 2019
    3620 pages
    ISBN:9781450366748
    DOI:10.1145/3308558

    Copyright © 2019 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 13 May 2019

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate1,899of8,196submissions,23%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format