skip to main content
10.1145/3589334.3645701acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article
Open Access

Query in Your Tongue: Reinforce Large Language Models with Retrievers for Cross-lingual Search Generative Experience

Authors Info & Claims
Published:13 May 2024Publication History

ABSTRACT

In the contemporary digital landscape, search engines play an invaluable role in information access, yet they often face challenges in Cross-Lingual Information Retrieval (CLIR). Though attempts are made to improve CLIR, current methods still leave users grappling with issues such as misplaced named entities and lost cultural context when querying in non-native languages. While some advances have been made using Neural Machine Translation models and cross-lingual representation, these are not without limitations. Enter the paradigm shift brought about by Large Language Models (LLMs), which have transformed search engines from simple retrievers to generators of contextually relevant information. This paper introduces the Multilingual Information Model for Intelligent Retrieval (MIMIR). Built on the power of LLMs, MIMIR directly responds in the language of the user's query, reducing the need for post-search translations. Our model's architecture encompasses a dual-module system: a retriever for searching multilingual documents and a responder for crafting answers in the user's desired language. Through a unique unified training framework, with the retriever serving as a reward model supervising the responder, and in turn, the responder producing synthetic data to refine the retriever's proficiency, MIMIR's retriever and responder iteratively enhance each other. Performance evaluations via CLEF and MKQA benchmarks reveal MIMIR's superiority over existing models, effectively addressing traditional CLIR challenges.

Skip Supplemental Material Section

Supplemental Material

rfp2374.mp4

Supplemental video

mp4

31.5 MB

References

  1. Alon Albalak, Sharon Levy, and William Yang Wang. 2023. Addressing Issues of Cross-Linguality in Open-Retrieval Question Answering Systems For Emergent Domains. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, Dubrovnik, Croatia, 1--10. https://doi.org/10.18653/v1/2023.eacl-demo.1Google ScholarGoogle ScholarCross RefCross Ref
  2. Akari Asai, Jungo Kasai, Jonathan Clark, Kenton Lee, Eunsol Choi, and Hannaneh Hajishirzi. 2021. XOR QA: Cross-lingual Open-Retrieval Question Answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 547--564. https://doi.org/10.18653/v1/2021.naacl-main.46Google ScholarGoogle ScholarCross RefCross Ref
  3. Akari Asai, Shayne Longpre, Jungo Kasai, Chia-Hsuan Lee, Rui Zhang, Junjie Hu, Ikuya Yamada, Jonathan H. Clark, and Eunsol Choi. 2022. MIA 2022 Shared Task: Evaluating Cross-lingual Open-Retrieval Question Answering for 16 Diverse Languages. In Proceedings of the Workshop on Multilingual Information Access (MIA). Association for Computational Linguistics, Seattle, USA, 108--120. https://doi.org/10.18653/v1/2022.mia-1.11Google ScholarGoogle ScholarCross RefCross Ref
  4. Arian Askari, Mohammad Aliannejadi, Evangelos Kanoulas, and Suzan Verberne. 2023. Generating Synthetic Documents for Cross-Encoder Re-Rankers: A Comparative Study of ChatGPT and Human Experts. arXiv:2305.02320 [cs.IR]Google ScholarGoogle Scholar
  5. Hamed Bonab, James Allan, and Ramesh Sitaraman. 2019. Simulating CLIR Translation Resource Scarcity Using High-Resource Languages. In Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval (Santa Clara, CA, USA) (ICTIR '19). Association for Computing Machinery, New York, NY, USA, 129--136. https://doi.org/10.1145/3341981.3344236Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Martin Braschler. 2003. CLEF 2002 - Overview of Results. In Advances in Cross- Language Information Retrieval, Carol Peters, Martin Braschler, Julio Gonzalo, and Michael Kluck (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 9--27.Google ScholarGoogle Scholar
  7. Jiangui Chen, Ruqing Zhang, Jiafeng Guo, Yiqun Liu, Yixing Fan, and Xueqi Cheng. 2022. CorpusBrain: Pre-Train a Generative Retrieval Model for Knowledge- Intensive Language Tasks. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management (Atlanta, GA, USA) (CIKM'22). Association for Computing Machinery, New York, NY, USA, 191--200. https://doi.org/10.1145/3511808.3557271Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Mikhail Fain, Niall Twomey, and Danushka Bollegala. 2021. Backretrieval: An Image-Pivoted Evaluation Metric for Cross-Lingual Text Representations Without Parallel Corpora. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (Virtual Event, Canada) (SIGIR'21). Association for Computing Machinery, New York, NY, USA, 2106--2110. https://doi.org/10.1145/3404835.3463027Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Fangxiaoyu Feng, Yinfei Yang, Daniel Cer, Naveen Arivazhagan, and Wei Wang. 2022. Language-agnostic BERT Sentence Embedding. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dublin, Ireland, 878--891. https://doi.org/10.18653/v1/2022.acl-long.62Google ScholarGoogle ScholarCross RefCross Ref
  10. Jiazhan Feng, Chongyang Tao, Xiubo Geng, Tao Shen, Can Xu, Guodong Long, Dongyan Zhao, and Daxin Jiang. 2023. Knowledge Refinement via Interaction Between Search Engines and Large Language Models. arXiv:2305.07402 [cs.CL]Google ScholarGoogle Scholar
  11. Thamme Gowda, Weiqiu You, Constantine Lignos, and Jonathan May. 2021. Macro-Average: Rare Types Are Important Too. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 1138--1157. https://doi.org/10.18653/v1/2021.naacl-main.90Google ScholarGoogle ScholarCross RefCross Ref
  12. Taicheng Guo, Lu Yu, Basem Shihada, and Xiangliang Zhang. 2023. Few-Shot News Recommendation via Cross-Lingual Transfer. In Proceedings of the ACM Web Conference 2023 (Austin, TX, USA) (WWW '23). Association for Computing Machinery, New York, NY, USA, 1130--1140. https://doi.org/10.1145/3543507.3583383Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Momchil Hardalov, Todor Mihaylov, Dimitrina Zlatkova, Yoan Dinkov, Ivan Koychev, and Preslav Nakov. 2020. EXAMS: A Multi-subject High School Examinations Dataset for Cross-lingual and Multilingual Question Answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 5427--5444. https://doi.org/10.18653/v1/2020.emnlp-main.438Google ScholarGoogle ScholarCross RefCross Ref
  14. Xiyang Hu, Xinchi Chen, Peng Qi, Deguang Kong, Kunlun Liu, William Yang Wang, and Zhiheng Huang. 2023. Language Agnostic Multilingual Information Retrieval with Contrastive Learning. In Findings of the Association for Computational Linguistics: ACL 2023. Association for Computational Linguistics, Toronto, Canada, 9133--9146. https://doi.org/10.18653/v1/2023.findings-acl.581Google ScholarGoogle ScholarCross RefCross Ref
  15. Kung-Hsiang Huang, ChengXiang Zhai, and Heng Ji. 2022. CONCRETE: Improving Cross-lingual Fact-checking with Cross-lingual Retrieval. In Proceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Gyeongju, Republic of Korea, 1024--1035. https://aclanthology.org/2022.coling-1.86Google ScholarGoogle Scholar
  16. Zhiqi Huang, Hamed Bonab, Sheikh Muhammad Sarwar, Razieh Rahimi, and James Allan. 2021. Mixed Attention Transformer for Leveraging Word-Level Knowledge to Neural Cross-Lingual Information Retrieval. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (Virtual Event, Queensland, Australia) (CIKM '21). Association for Computing Machinery, New York, NY, USA, 760--770. https://doi.org/10.1145/3459637.3482452Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Zhiqi Huang, Puxuan Yu, and James Allan. 2023. Cross-lingual Knowledge Transfer via Distillation for Multilingual Information Retrieval. arXiv:2302.13400 cs.IRGoogle ScholarGoogle Scholar
  18. Zhiqi Huang, Puxuan Yu, and James Allan. 2023. Improving Cross-Lingual Information Retrieval on Low-Resource Languages via Optimal Transport Distillation. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining (Singapore, Singapore) (WSDM '23). Association for Computing Machinery, New York, NY, USA, 1048--1056. https://doi.org/10.1145/3539597.3570468Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Zhiqi Huang, Hansi Zeng, Hamed Zamani, and James Allan. 2023. Soft Prompt Decoding for Multilingual Dense Retrieval. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (Taipei, Taiwan) (SIGIR '23). Association for Computing Machinery, New York, NY, USA, 1208--1218. https://doi.org/10.1145/3539618.3591769Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, and Edouard Grave. 2022. Unsupervised Dense Information Retrieval with Contrastive Learning. Transactions on Machine Learning Research (2022). https://openreview.net/forum?id=jKN1pXi7b0Google ScholarGoogle Scholar
  21. Vitor Jeronymo, Roberto Lotufo, and Rodrigo Nogueira. 2023. NeuralMind-UNICAMP at 2022 TREC NeuCLIR: Large Boring Rerankers for Cross-lingual Retrieval. arXiv:2303.16145 [cs.IR]Google ScholarGoogle Scholar
  22. Zhengbao Jiang, Jun Araki, Haibo Ding, and Graham Neubig. 2021. How CanWe Know When Language Models Know? On the Calibration of Language Models for Question Answering. Transactions of the Association for Computational Linguistics 9 (2021), 962--977. https://doi.org/10.1162/tacl_a_00407Google ScholarGoogle ScholarCross RefCross Ref
  23. Zhuolin Jiang, Amro El-Jaroudi, William Hartmann, Damianos Karakos, and Lingjun Zhao. 2020. Cross-lingual Information Retrieval with BERT. In Proceedings of the workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS2020). European Language Resources Association, Marseille, France, 26--31. https://aclanthology.org/2020.clssts-1.5Google ScholarGoogle Scholar
  24. Jia-Huei Ju, Jheng-Hong Yang, and Chuan-Ju Wang. 2021. Text-to-Text Multi-View Learning for Passage Re-Ranking. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (Virtual Event, Canada) (SIGIR '21). Association for Computing Machinery, New York, NY, USA, 1803--1807. https://doi.org/10.1145/3404835.3463048Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Yulong Li, Martin Franz, Md Arafat Sultan, Bhavani Iyer, Young-Suk Lee, and Avirup Sil. 2022. Learning Cross-Lingual IR from an English Retriever. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Seattle, United States, 4428--4436. https://doi.org/10.18653/v1/2022.naacl-main.329Google ScholarGoogle ScholarCross RefCross Ref
  26. Jimmy Lin, David Alfonso-Hermelo, Vitor Jeronymo, Ehsan Kamalloo, Carlos Lassance, Rodrigo Nogueira, Odunayo Ogundepo, Mehdi Rezagholizadeh, Nandan Thakur, Jheng-Hong Yang, and Xinyu Zhang. 2023. Simple Yet Effective Neural Ranking and Reranking Baselines for Cross-Lingual Information Retrieval. arXiv:2304.01019 [cs.IR]Google ScholarGoogle Scholar
  27. Robert Litschko, Ekaterina Artemova, and Barbara Plank. 2023. Boosting Zeroshot Cross-lingual Retrieval by Training on Artificially Code-Switched Data. In Findings of the Association for Computational Linguistics: ACL 2023. Association for Computational Linguistics, Toronto, Canada, 3096--3108. https://doi.org/10.18653/v1/2023.findings-acl.193Google ScholarGoogle ScholarCross RefCross Ref
  28. Robert Litschko, Ivan Vulic, and Goran Glava?. 2022. Parameter-Efficient Neural Reranking for Cross-Lingual and Multilingual Retrieval. In Proceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Gyeongju, Republic of Korea, 1071--1082. https://aclanthology.org/2022.coling-1.90Google ScholarGoogle Scholar
  29. Robert Litschko, Ivan Vulic, Simone Paolo Ponzetto, and Goran Glava?. 2021. Evaluating Multilingual Text Encoders for Unsupervised Cross-Lingual Retrieval. In Advances in Information Retrieval: 43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28 -- April 1, 2021, Proceedings, Part I. Springer- Verlag, Berlin, Heidelberg, 342--358. https://doi.org/10.1007/978--3-030--72113--8_23Google ScholarGoogle ScholarCross RefCross Ref
  30. Robert Litschko, Ivan Vulic, Simone Paolo Ponzetto, and Goran Glava?. 2022. On Cross-Lingual Retrieval with Multilingual Text Encoders. Inf. Retr. 25, 2 (jun 2022), 149--183. https://doi.org/10.1007/s10791-022-09406-xGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jiapeng Liu, Xiao Zhang, Dan Goldwasser, and Xiao Wang. 2020. Cross- Lingual Document Retrieval with Smooth Learning. In Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Barcelona, Spain (Online), 3616--3629. https://doi.org/10.18653/v1/2020.coling-main.323Google ScholarGoogle ScholarCross RefCross Ref
  32. Shayne Longpre, Yi Lu, and Joachim Daiber. 2020. MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering. https//arxiv.org/pdf/2007.15207.pdfGoogle ScholarGoogle Scholar
  33. Iain Mackie, Shubham Chatterjee, and Jeffrey Dalton. 2023. Generative and Pseudo-Relevant Feedback for Sparse, Dense and Learned Sparse Retrieval. arXiv:2305.07477 cs.IRGoogle ScholarGoogle Scholar
  34. Iain Mackie, Ivan Sekulic, Shubham Chatterjee, Jeffrey Dalton, and Fabio Crestani. 2023. GRM: Generative Relevance Modeling Using Relevance-Aware Sample Estimation for Document Retrieval. arXiv:2306.09938 cs.IRGoogle ScholarGoogle Scholar
  35. Kelong Mao, Zhicheng Dou, Haonan Chen, Fengran Mo, and Hongjin Qian. 2023. Large Language Models Know Your Contextual Search Intent: A Prompting Framework for Conversational Search. arXiv:2303.06573 cs.IRGoogle ScholarGoogle Scholar
  36. Niklas Muennighoff, Thomas Wang, Lintang Sutawika, Adam Roberts, Stella Biderman, Teven Le Scao, M Saiful Bari, Sheng Shen, Zheng-Xin Yong, Hailey Schoelkopf, Xiangru Tang, Dragomir Radev, Alham Fikri Aji, Khalid Almubarak, Samuel Albanie, Zaid Alyafeai, Albert Webson, Edward Raff, and Colin Raffel. 2023. Crosslingual Generalization through Multitask Finetuning. arXiv:2211.01786 cs.CLGoogle ScholarGoogle Scholar
  37. Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, PeterWelinder, Paul Christiano, Jan Leike, and Ryan Lowe. 2022. Training language models to follow instructions with human feedback. arXiv:2203.02155 cs.CLGoogle ScholarGoogle Scholar
  38. Zhen Qin, Rolf Jagerman, Kai Hui, Honglei Zhuang, Junru Wu, Jiaming Shen, Tianqi Liu, Jialu Liu, Donald Metzler, Xuanhui Wang, and Michael Bendersky. 2023. Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting. arXiv:2306.17563 [cs.IR]Google ScholarGoogle Scholar
  39. Houxing Ren, Linjun Shou, NingWu, Ming Gong, and Daxin Jiang. 2022. Empowering Dual-Encoder with Query Generator for Cross-Lingual Dense Retrieval. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 3107--3121. https://doi.org/10.18653/v1/2022.emnlp-main.203Google ScholarGoogle ScholarCross RefCross Ref
  40. John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal Policy Optimization Algorithms. arXiv:1707.06347 cs.LGGoogle ScholarGoogle Scholar
  41. Tao Shen, Guodong Long, Xiubo Geng, Chongyang Tao, Tianyi Zhou, and Daxin Jiang. 2023. Large Language Models are Strong Zero-Shot Retriever. arXiv:2304.14233 cs.CLGoogle ScholarGoogle Scholar
  42. Peng Shi, Rui Zhang, He Bai, and Jimmy Lin. 2021. Cross-Lingual Training of Dense Retrievers for Document Retrieval. In Proceedings of the 1st Workshop on Multilingual Representation Learning. Association for Computational Linguistics, Punta Cana, Dominican Republic, 251--253. https://doi.org/10.18653/v1/2021.mrl-1.24Google ScholarGoogle ScholarCross RefCross Ref
  43. Nikita Sorokin, Dmitry Abulkhanov, Irina Piontkovskaya, and Valentin Malykh. 2022. Ask Me Anything in Your Native Language. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Seattle, United States, 395--406. https://doi.org/10.18653/v1/2022.naacl-main.30Google ScholarGoogle ScholarCross RefCross Ref
  44. Krishna Srinivasan, Karthik Raman, Anupam Samanta, Lingrui Liao, Luca Bertelli, and Michael Bendersky. 2022. QUILL: Query Intent with Large Language Models using Retrieval Augmentation and Multi-stage Distillation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track. Association for Computational Linguistics, Abu Dhabi, UAE, 492--501. https://doi.org/10.18653/v1/2022.emnlp-industry.50Google ScholarGoogle ScholarCross RefCross Ref
  45. Weiting Tan, Kevin Heffernan, Holger Schwenk, and Philipp Koehn. 2023. Multilingual Representation Distillation with Contrastive Learning. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Dubrovnik, Croatia, 1477--1490. https://doi.org/10.18653/v1/2023.eacl-main.108Google ScholarGoogle ScholarCross RefCross Ref
  46. Yi Tay, Vinh Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W Cohen, and Donald Metzler. 2022. Transformer Memory as a Differentiable Search Index. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 21831--21843. https://proceedings.neurips.cc/paper_files/paper/2022/file/892840a6123b5ec99ebaab8be1530fba-Paper-Conference.pdfGoogle ScholarGoogle Scholar
  47. Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal. 2023. Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Toronto, Canada, 10014--10037. https://doi.org/10.18653/v1/2023.acl-long.557Google ScholarGoogle ScholarCross RefCross Ref
  48. Zhucheng Tu and Sarguna Janani Padmanabhan. 2022. MIA 2022 Shared Task Submission: Leveraging Entity Representations, Dense-Sparse Hybrids, and Fusionin-Decoder for Cross-Lingual Question Answering. In Proceedings of the Workshop on Multilingual Information Access (MIA). Association for Computational Linguistics, Seattle, USA, 100--107. https://doi.org/10.18653/v1/2022.mia-1.10Google ScholarGoogle ScholarCross RefCross Ref
  49. Liang Wang, Nan Yang, and Furu Wei. 2023. Query2doc: Query Expansion with Large Language Models. arXiv:2303.07678 [cs.IR]Google ScholarGoogle Scholar
  50. Runchuan Wang, Zhao Zhang, Fuzhen Zhuang, Dehong Gao, Yi Wei, and Qing He. 2021. Adversarial Domain Adaptation for Cross-Lingual Information Retrieval with Multilingual BERT. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (Virtual Event, Queensland, Australia) (CIKM '21). Association for Computing Machinery, New York, NY, USA, 3498--3502. https://doi.org/10.1145/3459637.3482050Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. John Wieting, Jonathan Clark, William Cohen, Graham Neubig, and Taylor Berg-Kirkpatrick. 2023. Beyond Contrastive Learning: A Variational Generative Model for Multilingual Retrieval. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Toronto, Canada, 12044--12066. https://doi.org/10.18653/v1/2023.acl-long.673Google ScholarGoogle ScholarCross RefCross Ref
  52. Linlong Xu, Baosong Yang, Xiaoyu Lv, Tianchi Bi, Dayiheng Liu, and Haibo Zhang. 2021. Leveraging Advantages of Interactive and Non-Interactive Models for Vector-Based Cross-Lingual Information Retrieval. arXiv:2111.01992 [cs.CL]Google ScholarGoogle Scholar
  53. Ziyi Yang, Yinfei Yang, Daniel Cer, and Eric Darve. 2021. A Simple and Effective Method To Eliminate the Self Language Bias in Multilingual Representations. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 5825--5832. https://doi.org/10.18653/v1/2021.emnlp-main. 470Google ScholarGoogle ScholarCross RefCross Ref
  54. Ziyi Yang, Yinfei Yang, Daniel Cer, Jax Law, and Eric Darve. 2021. Universal Sentence Representation Learning with Conditional Masked Language Model. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 6216--6228. https://doi.org/10.18653/v1/2021.emnlp-main. 502Google ScholarGoogle ScholarCross RefCross Ref
  55. Liang Yao, Baosong Yang, Haibo Zhang, Weihua Luo, and Boxing Chen. 2020. Exploiting Neural Query Translation into Cross Lingual Information Retrieval. arXiv:2010.13659 [cs.CL]Google ScholarGoogle Scholar
  56. Zhewei Yao, Reza Yazdani Aminabadi, Olatunji Ruwase, Samyam Rajbhandari, Xiaoxia Wu, Ammar Ahmad Awan, Jeff Rasley, Minjia Zhang, Conglong Li, Connor Holmes, Zhongzhu Zhou, Michael Wyatt, Molly Smith, Lev Kurilenko, Heyang Qin, Masahiro Tanaka, Shuai Che, Shuaiwen Leon Song, and Yuxiong He. 2023. DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales. arXiv:2308.01320 [cs.LG]Google ScholarGoogle Scholar
  57. Puxuan Yu and James Allan. 2020. A Study of Neural Matching Models for Cross- Lingual IR. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (Virtual Event, China) (SIGIR '20). Association for Computing Machinery, New York, NY, USA, 1637--1640. https://doi.org/10.1145/3397271.3401322Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Bryan Zhang and Amita Misra. 2022. Machine translation impact in E-commerce multilingual search. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track. Association for Computational Linguistics, Abu Dhabi, UAE, 99--109. https://doi.org/10.18653/v1/2022.emnlpindustry.8Google ScholarGoogle ScholarCross RefCross Ref
  59. Fuwei Zhang, Zhao Zhang, Xiang Ao, Dehong Gao, Fuzhen Zhuang, Yi Wei, and Qing He. 2022. Mind the Gap: Cross-Lingual Information Retrieval with Hierarchical Knowledge Enhancement. Proceedings of the AAAI Conference on Artificial Intelligence 36, 4 (Jun. 2022), 4345--4353. https://doi.org/10.1609/aaai.v36i4.20355Google ScholarGoogle ScholarCross RefCross Ref
  60. Shunyu Zhang, Yaobo Liang, MING GONG, Daxin Jiang, and Nan Duan. 2023. Modeling Sequential Sentence Relation to Improve Cross-lingual Dense Retrieval. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=-bVsNeR56KSGoogle ScholarGoogle Scholar
  61. Xinyu Zhang, Kelechi Ogueji, Xueguang Ma, and Jimmy Lin. 2023. Towards Best Practices for Training Multilingual Dense Retrieval Models. ACM Trans. Inf. Syst. (aug 2023). https://doi.org/10.1145/3613447 Just Accepted.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Shengyao Zhuang, Houxing Ren, Linjun Shou, Jian Pei, Ming Gong, Guido Zuccon, and Daxin Jiang. 2023. Bridging the Gap Between Indexing and Retrieval for Differentiable Search Index with Query Generation. arXiv:2206.10128 [cs.IR]Google ScholarGoogle Scholar
  63. Shengyao Zhuang, Linjun Shou, and Guido Zuccon. 2023. Augmenting Passage Representations with Query Generation for Enhanced Cross-Lingual Dense Retrieval. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (Taipei, Taiwan) (SIGIR'23). Association for Computing Machinery, New York, NY, USA, 1827--1832. https://doi.org/10.1145/3539618.3591952Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Daniel M. Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B. Brown, Alec Radford, Dario Amodei, Paul Christiano, and Geoffrey Irving. 2020. Fine-Tuning Language Models from Human Preferences. arXiv:1909.08593 [cs.CL]Google ScholarGoogle Scholar
  65. Noah Ziems, Wenhao Yu, Zhihan Zhang, and Meng Jiang. 2023. Large Language Models are Built-in Autoregressive Search Engines. In Findings of the Association for Computational Linguistics: ACL 2023. Association for Computational Linguistics, Toronto, Canada, 2666--2678. https://doi.org/10.18653/v1/2023.findingsacl.167Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Query in Your Tongue: Reinforce Large Language Models with Retrievers for Cross-lingual Search Generative Experience

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        WWW '24: Proceedings of the ACM on Web Conference 2024
        May 2024
        4826 pages
        ISBN:9798400701719
        DOI:10.1145/3589334

        Copyright © 2024 Owner/Author

        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 May 2024

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,899of8,196submissions,23%
      • Article Metrics

        • Downloads (Last 12 months)28
        • Downloads (Last 6 weeks)28

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader