ABSTRACT
In the past few years we observed a renewed interest in conversational recommender systems (CRS) that interact with users in natural language. Most recent research efforts use neural models trained on recorded recommendation dialogs between humans, supporting an end-to-end learning process. Given the user’s utterances in a dialog, these systems aim to generate appropriate responses in natural language based on the learned models. An alternative to such language generation approaches is to retrieve and possibly adapt suitable sentences from the recorded dialogs. Approaches of this latter type are explored only to a lesser extent in the current literature.
In this work, we revisit the potential value of retrieval-based approaches to conversational recommendation. To that purpose, we compare two recent deep learning models for response generation with a retrieval-based method that determines a set of response candidates using a nearest-neighbor technique and heuristically reranks them. We adopt a user-centric evaluation approach, where study participants (N=60) rated the responses of the three compared systems. We could reproduce the claimed improvement of one of the deep learning methods over the other. However, the retrieval-based system outperformed both language generation based approaches in terms of the perceived quality of the system responses. Overall, our study suggests that retrieval-based approaches should be considered as an alternative or complement to modern language generation-based approaches.
Supplemental Material
- Lisa Ballesteros and W Bruce Croft. 1997. Phrasal translation and query expansion techniques for cross-language information retrieval. In ACM SIGIR Forum, Vol. 31. 84–91.Google Scholar
- Matthew W Bilotti, Paul Ogilvie, Jamie Callan, and Eric Nyberg. 2007. Structured retrieval for question answering. In SIGIR ’07. 351–358.Google Scholar
- Li Chen and Pearl Pu. 2006. Evaluating critiquing-based recommender agents. In AAAI ’06. 157–162.Google Scholar
- Qibin Chen, Junyang Lin, Yichang Zhang, Ming Ding, Yukuo Cen, Hongxia Yang, and Jie Tang. 2019. Towards Knowledge-Based Recommender Dialog System. In EMNLP-IJCNLP ’19. 1803–1813.Google Scholar
- Qibin Chen, Junyang Lin, Yichang Zhang, Hongxia Yang, Jingren Zhou, and Jie Tang. 2019. Towards knowledge-based personalized product description generation in e-commerce. In KDD ’19. 3040–3050.Google Scholar
- Michael D. Ekstrand, F. Maxwell Harper, Martijn C. Willemsen, and Joseph A. Konstan. 2014. User Perception of Differences in Recommender Algorithms. In RecSys ’14. 161–168.Google Scholar
- Bilel Elayeb, Wiem Ben Romdhane, and Narjes Bellamine Ben Saoud. 2018. Towards a new possibilistic query translation tool for cross-language information retrieval. Multimedia Tools and Applications 77, 2 (2018), 2423–2465.Google ScholarDigital Library
- Chongming Gao, Wenqiang Lei, Xiangnan He, Maarten de Rijke, and Tat-Seng Chua. 2021. Advances and Challenges in Conversational Recommender Systems: A Survey. arxiv:2101.09459Google Scholar
- Peter Grasch, Alexander Felfernig, and Florian Reinfrank. 2013. Recomment: Towards critiquing-based recommendation with speech interaction. In RecSys ’13. 157–164.Google ScholarDigital Library
- Shirley Anugrah Hayati, Dongyeop Kang, Qingxiaoyang Zhu, Weiyan Shi, and Zhou Yu. 2020. INSPIRED: Toward Sociable Recommendation Dialog Systems. In EMNLP ’20.Google Scholar
- Dietmar Jannach and Ahtsham Manzoor. 2020. End-to-End Learning for Conversational Recommendation: A Long Way to Go?. In IntRS Workshop at ACM RecSys 2020. Online.Google Scholar
- Dietmar Jannach, Ahtsham Manzoor, Wanling Cai, and Li Chen. 2021. A Survey on Conversational Recommender Systems. Comput. Surveys 54(2021), 1–26. Issue 5.Google ScholarDigital Library
- Yucheng Jin, Wanling Cai, Li Chen, Nyi Nyi Htun, and Katrien Verbert. 2019. MusicBot: Evaluating critiquing-based music recommenders with conversational interaction. In CIKM ’19. 951–960.Google Scholar
- Chaitanya K. Joshi, Fei Mi, and Boi Faltings. 2017. Personalization in Goal-Oriented Dialog. In NeurIPS ’17 Workshop on Conversational AI.Google Scholar
- Dongyeop Kang, Anusha Balakrishnan, Pararth Shah, Paul Crook, Y-Lan Boureau, and Jason Weston. 2019. Recommendation as a Communication Game: Self-Supervised Bot-Play for Goal-oriented Dialogue. In EMNLP-IJCNLP ’19. 1951–1961.Google Scholar
- Bart P. Knijnenburg, Martijn C. Willemsen, Zeno Gantner, Hakan Soncu, and Chris Newell. 2012. Explaining the user experience of recommender systems. User Modeling and User-Adapted Interaction 22, 4 (2012), 441–504.Google ScholarDigital Library
- Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick Van Kleef, Sören Auer, 2015. DBpedia–a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web 6, 2 (2015), 167–195.Google ScholarCross Ref
- Raymond Li, Samira Ebrahimi Kahou, Hannes Schulz, Vincent Michalski, Laurent Charlin, and Chris Pal. 2018. Towards deep conversational recommendations. In NIPS ’18. 9725–9735.Google Scholar
- Chia-Wei Liu, Ryan Lowe, Iulian Serban, Mike Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation. In EMNLP ’16. 2122–2132.Google Scholar
- Peter J Liu, Mohammad Saleh, Etienne Pot, Ben Goodrich, Ryan Sepassi, Lukasz Kaiser, and Noam Shazeer. 2018. Generating Wikipedia by Summarizing Long Sequences. In ICLR ’18.Google Scholar
- Tariq Mahmood and Francesco Ricci. 2009. Improving recommender systems with adaptive conversational strategies. In RecSys ’09. 73–82.Google Scholar
- Myle Ott, Sergey Edunov, David Grangier, and Michael Auli. 2018. Scaling Neural Machine Translation. In WMT ’18. 1–9.Google Scholar
- Florian Pecune, Shruti Murali, Vivian Tsai, Yoichi Matsuyama, and Justine Cassell. 2019. A model of social explanations for a conversational movie recommendation system. In HAI ’19. 135–143.Google Scholar
- Gustavo Penha and Claudia Hauff. 2020. What Does BERT Know about Books, Movies and Music? Probing BERT for Conversational Recommendation. In RecSys ’20. 388–397.Google Scholar
- Pearl Pu, Li Chen, and Rong Hu. 2011. A user-centric evaluation framework for recommender systems. In RecSys ’11. 157–164.Google Scholar
- Minghui Qiu, Feng-Lin Li, Siyu Wang, Xing Gao, Yan Chen, Weipeng Zhao, Haiqing Chen, Jun Huang, and Wei Chu. 2017. Alime chat: A sequence to sequence and rerank based chatbot engine. In ACL’17. 498–503.Google Scholar
- Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In EMNLP ’16. 2383–2392.Google Scholar
- Nils Reimers, Iryna Gurevych, Nils Reimers, Iryna Gurevych, Nandan Thakur, Nils Reimers, Johannes Daxenberger, and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In EMNLP ’19.Google Scholar
- Stefan Riezler, Alexander Vasserman, Ioannis Tsochantaridis, Vibhu O Mittal, and Yi Liu. 2007. Statistical machine translation for query expansion in answer retrieval. In ACL ’07. 464–471.Google Scholar
- Wataru Sakata, Tomohide Shibata, Ribeka Tanaka, and Sadao Kurohashi. 2019. FAQ retrieval using query-question similarity and BERT-based query-answer relevance. In SIGIR ’19. 1113–1116.Google Scholar
- Alessandro Sordoni, Yoshua Bengio, Hossein Vahabi, Christina Lioma, Jakob Grue Simonsen, and Jian-Yun Nie. 2015. A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion. In CIKM ’15. 553–562.Google Scholar
- Sandeep Subramanian, Adam Trischler, Yoshua Bengio, and Christopher J Pal. 2018. Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning. In ICLR ’18.Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NIPS ’17. 5998–6008.Google Scholar
- Pontus Wärnestål. 2005. User Evaluation of a Conversational Recommender System. In IJCAI ’05 Workshop on Knowledge and Reasoning in Practical Dialogue Systems.Google Scholar
- Rui Yan, Yiping Song, and Hua Wu. 2016. Learning to respond with deep neural networks for retrieval-based human-computer conversation system. In SIGIR ’16. 55–64.Google Scholar
- Liu Yang, Junjie Hu, Minghui Qiu, Chen Qu, Jianfeng Gao, W Bruce Croft, Xiaodong Liu, Yelong Shen, and Jingjing Liu. 2019. A hybrid retrieval-generation neural conversation model. In CIKM ’19. 1341–1350.Google Scholar
- Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W Cohen, Ruslan Salakhutdinov, and Christopher D Manning. 2018. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. In EMNLP ’18.Google Scholar
- Li Zhou, Jianfeng Gao, Di Li, and Heung-Yeung Shum. 2020. The Design and Implementation of XiaoIce, an Empathetic Social Chatbot. Computational Linguistics 46, 1 (2020), 53–93.Google ScholarDigital Library
Recommendations
Towards retrieval-based conversational recommendation
AbstractConversational recommender systems (CRS) have attracted immense attention in the past few years. The most recent approaches rely on neural models trained on recorded dialogs between humans, implementing an end-to-end learning process. ...
Highlights- We propose a novel approach to retrieval-based conversational recommendation.
- ...
Conversational Collaborative Recommendation --- An Experimental Analysis
Traditionally, collaborative recommender systems have been based on a single-shot model of recommendation where a single set of recommendations is generated based on a user's (past) stored preferences. However, content-based recommender system research ...
Multimodal Conversational Fashion Recommendation with Positive and Negative Natural-Language Feedback
CUI '22: Proceedings of the 4th Conference on Conversational User InterfacesIn a real-world shopping scenario, users can express their natural-language feedback when communicating with a shopping assistant by stating their satisfactions positively with “I like” or negatively with “I dislike” according to the quality of the ...
Comments