ABSTRACT
In the contemporary digital landscape, search engines play an invaluable role in information access, yet they often face challenges in Cross-Lingual Information Retrieval (CLIR). Though attempts are made to improve CLIR, current methods still leave users grappling with issues such as misplaced named entities and lost cultural context when querying in non-native languages. While some advances have been made using Neural Machine Translation models and cross-lingual representation, these are not without limitations. Enter the paradigm shift brought about by Large Language Models (LLMs), which have transformed search engines from simple retrievers to generators of contextually relevant information. This paper introduces the Multilingual Information Model for Intelligent Retrieval (MIMIR). Built on the power of LLMs, MIMIR directly responds in the language of the user's query, reducing the need for post-search translations. Our model's architecture encompasses a dual-module system: a retriever for searching multilingual documents and a responder for crafting answers in the user's desired language. Through a unique unified training framework, with the retriever serving as a reward model supervising the responder, and in turn, the responder producing synthetic data to refine the retriever's proficiency, MIMIR's retriever and responder iteratively enhance each other. Performance evaluations via CLEF and MKQA benchmarks reveal MIMIR's superiority over existing models, effectively addressing traditional CLIR challenges.
Supplemental Material
- Alon Albalak, Sharon Levy, and William Yang Wang. 2023. Addressing Issues of Cross-Linguality in Open-Retrieval Question Answering Systems For Emergent Domains. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, Dubrovnik, Croatia, 1--10. https://doi.org/10.18653/v1/2023.eacl-demo.1Google ScholarCross Ref
- Akari Asai, Jungo Kasai, Jonathan Clark, Kenton Lee, Eunsol Choi, and Hannaneh Hajishirzi. 2021. XOR QA: Cross-lingual Open-Retrieval Question Answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 547--564. https://doi.org/10.18653/v1/2021.naacl-main.46Google ScholarCross Ref
- Akari Asai, Shayne Longpre, Jungo Kasai, Chia-Hsuan Lee, Rui Zhang, Junjie Hu, Ikuya Yamada, Jonathan H. Clark, and Eunsol Choi. 2022. MIA 2022 Shared Task: Evaluating Cross-lingual Open-Retrieval Question Answering for 16 Diverse Languages. In Proceedings of the Workshop on Multilingual Information Access (MIA). Association for Computational Linguistics, Seattle, USA, 108--120. https://doi.org/10.18653/v1/2022.mia-1.11Google ScholarCross Ref
- Arian Askari, Mohammad Aliannejadi, Evangelos Kanoulas, and Suzan Verberne. 2023. Generating Synthetic Documents for Cross-Encoder Re-Rankers: A Comparative Study of ChatGPT and Human Experts. arXiv:2305.02320 [cs.IR]Google Scholar
- Hamed Bonab, James Allan, and Ramesh Sitaraman. 2019. Simulating CLIR Translation Resource Scarcity Using High-Resource Languages. In Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval (Santa Clara, CA, USA) (ICTIR '19). Association for Computing Machinery, New York, NY, USA, 129--136. https://doi.org/10.1145/3341981.3344236Google ScholarDigital Library
- Martin Braschler. 2003. CLEF 2002 - Overview of Results. In Advances in Cross- Language Information Retrieval, Carol Peters, Martin Braschler, Julio Gonzalo, and Michael Kluck (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 9--27.Google Scholar
- Jiangui Chen, Ruqing Zhang, Jiafeng Guo, Yiqun Liu, Yixing Fan, and Xueqi Cheng. 2022. CorpusBrain: Pre-Train a Generative Retrieval Model for Knowledge- Intensive Language Tasks. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management (Atlanta, GA, USA) (CIKM'22). Association for Computing Machinery, New York, NY, USA, 191--200. https://doi.org/10.1145/3511808.3557271Google ScholarDigital Library
- Mikhail Fain, Niall Twomey, and Danushka Bollegala. 2021. Backretrieval: An Image-Pivoted Evaluation Metric for Cross-Lingual Text Representations Without Parallel Corpora. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (Virtual Event, Canada) (SIGIR'21). Association for Computing Machinery, New York, NY, USA, 2106--2110. https://doi.org/10.1145/3404835.3463027Google ScholarDigital Library
- Fangxiaoyu Feng, Yinfei Yang, Daniel Cer, Naveen Arivazhagan, and Wei Wang. 2022. Language-agnostic BERT Sentence Embedding. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dublin, Ireland, 878--891. https://doi.org/10.18653/v1/2022.acl-long.62Google ScholarCross Ref
- Jiazhan Feng, Chongyang Tao, Xiubo Geng, Tao Shen, Can Xu, Guodong Long, Dongyan Zhao, and Daxin Jiang. 2023. Knowledge Refinement via Interaction Between Search Engines and Large Language Models. arXiv:2305.07402 [cs.CL]Google Scholar
- Thamme Gowda, Weiqiu You, Constantine Lignos, and Jonathan May. 2021. Macro-Average: Rare Types Are Important Too. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 1138--1157. https://doi.org/10.18653/v1/2021.naacl-main.90Google ScholarCross Ref
- Taicheng Guo, Lu Yu, Basem Shihada, and Xiangliang Zhang. 2023. Few-Shot News Recommendation via Cross-Lingual Transfer. In Proceedings of the ACM Web Conference 2023 (Austin, TX, USA) (WWW '23). Association for Computing Machinery, New York, NY, USA, 1130--1140. https://doi.org/10.1145/3543507.3583383Google ScholarDigital Library
- Momchil Hardalov, Todor Mihaylov, Dimitrina Zlatkova, Yoan Dinkov, Ivan Koychev, and Preslav Nakov. 2020. EXAMS: A Multi-subject High School Examinations Dataset for Cross-lingual and Multilingual Question Answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 5427--5444. https://doi.org/10.18653/v1/2020.emnlp-main.438Google ScholarCross Ref
- Xiyang Hu, Xinchi Chen, Peng Qi, Deguang Kong, Kunlun Liu, William Yang Wang, and Zhiheng Huang. 2023. Language Agnostic Multilingual Information Retrieval with Contrastive Learning. In Findings of the Association for Computational Linguistics: ACL 2023. Association for Computational Linguistics, Toronto, Canada, 9133--9146. https://doi.org/10.18653/v1/2023.findings-acl.581Google ScholarCross Ref
- Kung-Hsiang Huang, ChengXiang Zhai, and Heng Ji. 2022. CONCRETE: Improving Cross-lingual Fact-checking with Cross-lingual Retrieval. In Proceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Gyeongju, Republic of Korea, 1024--1035. https://aclanthology.org/2022.coling-1.86Google Scholar
- Zhiqi Huang, Hamed Bonab, Sheikh Muhammad Sarwar, Razieh Rahimi, and James Allan. 2021. Mixed Attention Transformer for Leveraging Word-Level Knowledge to Neural Cross-Lingual Information Retrieval. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (Virtual Event, Queensland, Australia) (CIKM '21). Association for Computing Machinery, New York, NY, USA, 760--770. https://doi.org/10.1145/3459637.3482452Google ScholarDigital Library
- Zhiqi Huang, Puxuan Yu, and James Allan. 2023. Cross-lingual Knowledge Transfer via Distillation for Multilingual Information Retrieval. arXiv:2302.13400 cs.IRGoogle Scholar
- Zhiqi Huang, Puxuan Yu, and James Allan. 2023. Improving Cross-Lingual Information Retrieval on Low-Resource Languages via Optimal Transport Distillation. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining (Singapore, Singapore) (WSDM '23). Association for Computing Machinery, New York, NY, USA, 1048--1056. https://doi.org/10.1145/3539597.3570468Google ScholarDigital Library
- Zhiqi Huang, Hansi Zeng, Hamed Zamani, and James Allan. 2023. Soft Prompt Decoding for Multilingual Dense Retrieval. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (Taipei, Taiwan) (SIGIR '23). Association for Computing Machinery, New York, NY, USA, 1208--1218. https://doi.org/10.1145/3539618.3591769Google ScholarDigital Library
- Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, and Edouard Grave. 2022. Unsupervised Dense Information Retrieval with Contrastive Learning. Transactions on Machine Learning Research (2022). https://openreview.net/forum?id=jKN1pXi7b0Google Scholar
- Vitor Jeronymo, Roberto Lotufo, and Rodrigo Nogueira. 2023. NeuralMind-UNICAMP at 2022 TREC NeuCLIR: Large Boring Rerankers for Cross-lingual Retrieval. arXiv:2303.16145 [cs.IR]Google Scholar
- Zhengbao Jiang, Jun Araki, Haibo Ding, and Graham Neubig. 2021. How CanWe Know When Language Models Know? On the Calibration of Language Models for Question Answering. Transactions of the Association for Computational Linguistics 9 (2021), 962--977. https://doi.org/10.1162/tacl_a_00407Google ScholarCross Ref
- Zhuolin Jiang, Amro El-Jaroudi, William Hartmann, Damianos Karakos, and Lingjun Zhao. 2020. Cross-lingual Information Retrieval with BERT. In Proceedings of the workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS2020). European Language Resources Association, Marseille, France, 26--31. https://aclanthology.org/2020.clssts-1.5Google Scholar
- Jia-Huei Ju, Jheng-Hong Yang, and Chuan-Ju Wang. 2021. Text-to-Text Multi-View Learning for Passage Re-Ranking. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (Virtual Event, Canada) (SIGIR '21). Association for Computing Machinery, New York, NY, USA, 1803--1807. https://doi.org/10.1145/3404835.3463048Google ScholarDigital Library
- Yulong Li, Martin Franz, Md Arafat Sultan, Bhavani Iyer, Young-Suk Lee, and Avirup Sil. 2022. Learning Cross-Lingual IR from an English Retriever. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Seattle, United States, 4428--4436. https://doi.org/10.18653/v1/2022.naacl-main.329Google ScholarCross Ref
- Jimmy Lin, David Alfonso-Hermelo, Vitor Jeronymo, Ehsan Kamalloo, Carlos Lassance, Rodrigo Nogueira, Odunayo Ogundepo, Mehdi Rezagholizadeh, Nandan Thakur, Jheng-Hong Yang, and Xinyu Zhang. 2023. Simple Yet Effective Neural Ranking and Reranking Baselines for Cross-Lingual Information Retrieval. arXiv:2304.01019 [cs.IR]Google Scholar
- Robert Litschko, Ekaterina Artemova, and Barbara Plank. 2023. Boosting Zeroshot Cross-lingual Retrieval by Training on Artificially Code-Switched Data. In Findings of the Association for Computational Linguistics: ACL 2023. Association for Computational Linguistics, Toronto, Canada, 3096--3108. https://doi.org/10.18653/v1/2023.findings-acl.193Google ScholarCross Ref
- Robert Litschko, Ivan Vulic, and Goran Glava?. 2022. Parameter-Efficient Neural Reranking for Cross-Lingual and Multilingual Retrieval. In Proceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Gyeongju, Republic of Korea, 1071--1082. https://aclanthology.org/2022.coling-1.90Google Scholar
- Robert Litschko, Ivan Vulic, Simone Paolo Ponzetto, and Goran Glava?. 2021. Evaluating Multilingual Text Encoders for Unsupervised Cross-Lingual Retrieval. In Advances in Information Retrieval: 43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28 -- April 1, 2021, Proceedings, Part I. Springer- Verlag, Berlin, Heidelberg, 342--358. https://doi.org/10.1007/978--3-030--72113--8_23Google ScholarCross Ref
- Robert Litschko, Ivan Vulic, Simone Paolo Ponzetto, and Goran Glava?. 2022. On Cross-Lingual Retrieval with Multilingual Text Encoders. Inf. Retr. 25, 2 (jun 2022), 149--183. https://doi.org/10.1007/s10791-022-09406-xGoogle ScholarDigital Library
- Jiapeng Liu, Xiao Zhang, Dan Goldwasser, and Xiao Wang. 2020. Cross- Lingual Document Retrieval with Smooth Learning. In Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Barcelona, Spain (Online), 3616--3629. https://doi.org/10.18653/v1/2020.coling-main.323Google ScholarCross Ref
- Shayne Longpre, Yi Lu, and Joachim Daiber. 2020. MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering. https//arxiv.org/pdf/2007.15207.pdfGoogle Scholar
- Iain Mackie, Shubham Chatterjee, and Jeffrey Dalton. 2023. Generative and Pseudo-Relevant Feedback for Sparse, Dense and Learned Sparse Retrieval. arXiv:2305.07477 cs.IRGoogle Scholar
- Iain Mackie, Ivan Sekulic, Shubham Chatterjee, Jeffrey Dalton, and Fabio Crestani. 2023. GRM: Generative Relevance Modeling Using Relevance-Aware Sample Estimation for Document Retrieval. arXiv:2306.09938 cs.IRGoogle Scholar
- Kelong Mao, Zhicheng Dou, Haonan Chen, Fengran Mo, and Hongjin Qian. 2023. Large Language Models Know Your Contextual Search Intent: A Prompting Framework for Conversational Search. arXiv:2303.06573 cs.IRGoogle Scholar
- Niklas Muennighoff, Thomas Wang, Lintang Sutawika, Adam Roberts, Stella Biderman, Teven Le Scao, M Saiful Bari, Sheng Shen, Zheng-Xin Yong, Hailey Schoelkopf, Xiangru Tang, Dragomir Radev, Alham Fikri Aji, Khalid Almubarak, Samuel Albanie, Zaid Alyafeai, Albert Webson, Edward Raff, and Colin Raffel. 2023. Crosslingual Generalization through Multitask Finetuning. arXiv:2211.01786 cs.CLGoogle Scholar
- Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, PeterWelinder, Paul Christiano, Jan Leike, and Ryan Lowe. 2022. Training language models to follow instructions with human feedback. arXiv:2203.02155 cs.CLGoogle Scholar
- Zhen Qin, Rolf Jagerman, Kai Hui, Honglei Zhuang, Junru Wu, Jiaming Shen, Tianqi Liu, Jialu Liu, Donald Metzler, Xuanhui Wang, and Michael Bendersky. 2023. Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting. arXiv:2306.17563 [cs.IR]Google Scholar
- Houxing Ren, Linjun Shou, NingWu, Ming Gong, and Daxin Jiang. 2022. Empowering Dual-Encoder with Query Generator for Cross-Lingual Dense Retrieval. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 3107--3121. https://doi.org/10.18653/v1/2022.emnlp-main.203Google ScholarCross Ref
- John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal Policy Optimization Algorithms. arXiv:1707.06347 cs.LGGoogle Scholar
- Tao Shen, Guodong Long, Xiubo Geng, Chongyang Tao, Tianyi Zhou, and Daxin Jiang. 2023. Large Language Models are Strong Zero-Shot Retriever. arXiv:2304.14233 cs.CLGoogle Scholar
- Peng Shi, Rui Zhang, He Bai, and Jimmy Lin. 2021. Cross-Lingual Training of Dense Retrievers for Document Retrieval. In Proceedings of the 1st Workshop on Multilingual Representation Learning. Association for Computational Linguistics, Punta Cana, Dominican Republic, 251--253. https://doi.org/10.18653/v1/2021.mrl-1.24Google ScholarCross Ref
- Nikita Sorokin, Dmitry Abulkhanov, Irina Piontkovskaya, and Valentin Malykh. 2022. Ask Me Anything in Your Native Language. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Seattle, United States, 395--406. https://doi.org/10.18653/v1/2022.naacl-main.30Google ScholarCross Ref
- Krishna Srinivasan, Karthik Raman, Anupam Samanta, Lingrui Liao, Luca Bertelli, and Michael Bendersky. 2022. QUILL: Query Intent with Large Language Models using Retrieval Augmentation and Multi-stage Distillation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track. Association for Computational Linguistics, Abu Dhabi, UAE, 492--501. https://doi.org/10.18653/v1/2022.emnlp-industry.50Google ScholarCross Ref
- Weiting Tan, Kevin Heffernan, Holger Schwenk, and Philipp Koehn. 2023. Multilingual Representation Distillation with Contrastive Learning. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Dubrovnik, Croatia, 1477--1490. https://doi.org/10.18653/v1/2023.eacl-main.108Google ScholarCross Ref
- Yi Tay, Vinh Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W Cohen, and Donald Metzler. 2022. Transformer Memory as a Differentiable Search Index. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 21831--21843. https://proceedings.neurips.cc/paper_files/paper/2022/file/892840a6123b5ec99ebaab8be1530fba-Paper-Conference.pdfGoogle Scholar
- Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal. 2023. Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Toronto, Canada, 10014--10037. https://doi.org/10.18653/v1/2023.acl-long.557Google ScholarCross Ref
- Zhucheng Tu and Sarguna Janani Padmanabhan. 2022. MIA 2022 Shared Task Submission: Leveraging Entity Representations, Dense-Sparse Hybrids, and Fusionin-Decoder for Cross-Lingual Question Answering. In Proceedings of the Workshop on Multilingual Information Access (MIA). Association for Computational Linguistics, Seattle, USA, 100--107. https://doi.org/10.18653/v1/2022.mia-1.10Google ScholarCross Ref
- Liang Wang, Nan Yang, and Furu Wei. 2023. Query2doc: Query Expansion with Large Language Models. arXiv:2303.07678 [cs.IR]Google Scholar
- Runchuan Wang, Zhao Zhang, Fuzhen Zhuang, Dehong Gao, Yi Wei, and Qing He. 2021. Adversarial Domain Adaptation for Cross-Lingual Information Retrieval with Multilingual BERT. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (Virtual Event, Queensland, Australia) (CIKM '21). Association for Computing Machinery, New York, NY, USA, 3498--3502. https://doi.org/10.1145/3459637.3482050Google ScholarDigital Library
- John Wieting, Jonathan Clark, William Cohen, Graham Neubig, and Taylor Berg-Kirkpatrick. 2023. Beyond Contrastive Learning: A Variational Generative Model for Multilingual Retrieval. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Toronto, Canada, 12044--12066. https://doi.org/10.18653/v1/2023.acl-long.673Google ScholarCross Ref
- Linlong Xu, Baosong Yang, Xiaoyu Lv, Tianchi Bi, Dayiheng Liu, and Haibo Zhang. 2021. Leveraging Advantages of Interactive and Non-Interactive Models for Vector-Based Cross-Lingual Information Retrieval. arXiv:2111.01992 [cs.CL]Google Scholar
- Ziyi Yang, Yinfei Yang, Daniel Cer, and Eric Darve. 2021. A Simple and Effective Method To Eliminate the Self Language Bias in Multilingual Representations. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 5825--5832. https://doi.org/10.18653/v1/2021.emnlp-main. 470Google ScholarCross Ref
- Ziyi Yang, Yinfei Yang, Daniel Cer, Jax Law, and Eric Darve. 2021. Universal Sentence Representation Learning with Conditional Masked Language Model. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 6216--6228. https://doi.org/10.18653/v1/2021.emnlp-main. 502Google ScholarCross Ref
- Liang Yao, Baosong Yang, Haibo Zhang, Weihua Luo, and Boxing Chen. 2020. Exploiting Neural Query Translation into Cross Lingual Information Retrieval. arXiv:2010.13659 [cs.CL]Google Scholar
- Zhewei Yao, Reza Yazdani Aminabadi, Olatunji Ruwase, Samyam Rajbhandari, Xiaoxia Wu, Ammar Ahmad Awan, Jeff Rasley, Minjia Zhang, Conglong Li, Connor Holmes, Zhongzhu Zhou, Michael Wyatt, Molly Smith, Lev Kurilenko, Heyang Qin, Masahiro Tanaka, Shuai Che, Shuaiwen Leon Song, and Yuxiong He. 2023. DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales. arXiv:2308.01320 [cs.LG]Google Scholar
- Puxuan Yu and James Allan. 2020. A Study of Neural Matching Models for Cross- Lingual IR. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (Virtual Event, China) (SIGIR '20). Association for Computing Machinery, New York, NY, USA, 1637--1640. https://doi.org/10.1145/3397271.3401322Google ScholarDigital Library
- Bryan Zhang and Amita Misra. 2022. Machine translation impact in E-commerce multilingual search. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track. Association for Computational Linguistics, Abu Dhabi, UAE, 99--109. https://doi.org/10.18653/v1/2022.emnlpindustry.8Google ScholarCross Ref
- Fuwei Zhang, Zhao Zhang, Xiang Ao, Dehong Gao, Fuzhen Zhuang, Yi Wei, and Qing He. 2022. Mind the Gap: Cross-Lingual Information Retrieval with Hierarchical Knowledge Enhancement. Proceedings of the AAAI Conference on Artificial Intelligence 36, 4 (Jun. 2022), 4345--4353. https://doi.org/10.1609/aaai.v36i4.20355Google ScholarCross Ref
- Shunyu Zhang, Yaobo Liang, MING GONG, Daxin Jiang, and Nan Duan. 2023. Modeling Sequential Sentence Relation to Improve Cross-lingual Dense Retrieval. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=-bVsNeR56KSGoogle Scholar
- Xinyu Zhang, Kelechi Ogueji, Xueguang Ma, and Jimmy Lin. 2023. Towards Best Practices for Training Multilingual Dense Retrieval Models. ACM Trans. Inf. Syst. (aug 2023). https://doi.org/10.1145/3613447 Just Accepted.Google ScholarDigital Library
- Shengyao Zhuang, Houxing Ren, Linjun Shou, Jian Pei, Ming Gong, Guido Zuccon, and Daxin Jiang. 2023. Bridging the Gap Between Indexing and Retrieval for Differentiable Search Index with Query Generation. arXiv:2206.10128 [cs.IR]Google Scholar
- Shengyao Zhuang, Linjun Shou, and Guido Zuccon. 2023. Augmenting Passage Representations with Query Generation for Enhanced Cross-Lingual Dense Retrieval. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (Taipei, Taiwan) (SIGIR'23). Association for Computing Machinery, New York, NY, USA, 1827--1832. https://doi.org/10.1145/3539618.3591952Google ScholarDigital Library
- Daniel M. Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B. Brown, Alec Radford, Dario Amodei, Paul Christiano, and Geoffrey Irving. 2020. Fine-Tuning Language Models from Human Preferences. arXiv:1909.08593 [cs.CL]Google Scholar
- Noah Ziems, Wenhao Yu, Zhihan Zhang, and Meng Jiang. 2023. Large Language Models are Built-in Autoregressive Search Engines. In Findings of the Association for Computational Linguistics: ACL 2023. Association for Computational Linguistics, Toronto, Canada, 2666--2678. https://doi.org/10.18653/v1/2023.findingsacl.167Google ScholarCross Ref
Index Terms
- Query in Your Tongue: Reinforce Large Language Models with Retrievers for Cross-lingual Search Generative Experience
Recommendations
Cross-Lingual Information Retrieve in Sogou Search
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information RetrievalIn recent years, more and more Chinese people desires to be able to access the large amount of foreign language information and understand what is happening all over the world. However, language barrier is always a problem to them. In order to break the ...
Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information RetrievalWe propose a new unified framework for monolingual (MoIR) and cross-lingual information retrieval (CLIR) which relies on the induction of dense real-valued word vectors known as word embeddings (WE) from comparable data. To this end, we make several ...
Semantic morphological variant selection and translation disambiguation for cross-lingual information retrieval
AbstractCross-Lingual Information Retrieval (CLIR) enables a user to query in a language which is different from the target documents language. CLIR incorporates a translation technique based on either a manual dictionary or a probabilistic dictionary ...
Comments