research-article

Open Access

MS MARCO Web Search: A Large-scale Information-rich Web Dataset with Millions of Real Click Labels

Authors:
Qi Chen

Microsoft, Beijing, China

Microsoft, Beijing, China

0009-0006-7394-0185
View Profile

,
Xiubo Geng

Microsoft, Beijing, China

Microsoft, Beijing, China

0000-0001-6477-7933
View Profile

,
Corby Rosset

Microsoft, Redmond, USA

Microsoft, Redmond, USA

0000-0001-9167-6214
View Profile

,
Carolyn Buractaon

Microsoft, Redmond, USA

Microsoft, Redmond, USA

0009-0005-2182-5145
View Profile

,
Jingwen Lu

Microsoft, Redmond, USA

Microsoft, Redmond, USA

0000-0001-8208-898X
View Profile

,
Tao Shen

University of Technology Sydney, Sydney, Australia

University of Technology Sydney, Sydney, Australia

0000-0003-3315-2468
View Profile

,
Kun Zhou

Microsoft, Beijing, China

Microsoft, Beijing, China

0000-0003-0650-9521
View Profile

,
Chenyan Xiong

Carnegie Mellon University, Pittsburgh, USA

Carnegie Mellon University, Pittsburgh, USA

0000-0002-0392-4183
View Profile

,
Yeyun Gong

Microsoft, Beijing, China

Microsoft, Beijing, China

0000-0001-9954-9674
View Profile

,
Paul Bennett

Spotify, New York, USA

Spotify, New York, USA

0009-0006-7852-9651
View Profile

,
Nick Craswell

Microsoft, Redmond, USA

Microsoft, Redmond, USA

0000-0002-9351-8137
View Profile

,
Xing Xie

Microsoft, Beijing, China

Microsoft, Beijing, China

0000-0002-8608-8482
View Profile

,
Fan Yang

Microsoft, Beijing, China

Microsoft, Beijing, China

0000-0002-0378-060X
View Profile

,
Bryan Tower

Microsoft, Redmond, USA

Microsoft, Redmond, USA

0000-0003-3659-6988
View Profile

,
Nikhil Rao

Microsoft, Mountain View, USA

Microsoft, Mountain View, USA

0000-0003-0281-932X
View Profile

,
Anlei Dong

Microsoft, Mountain View, USA

Microsoft, Mountain View, USA

0000-0002-8241-4746
View Profile

,
Wenqi Jiang

ETH Zurich, Zürich, Switzerland

ETH Zurich, Zürich, Switzerland

0000-0003-3895-7943
View Profile

,
Zheng Liu

Microsoft, Beijing, China

Microsoft, Beijing, China

0000-0001-7765-8466
View Profile

,
Mingqin Li

Microsoft, Redmond, USA

Microsoft, Redmond, USA

0009-0002-0270-9489
View Profile

,
Chuanjie Liu

Microsoft, Beijing, China

Microsoft, Beijing, China

0009-0009-0980-6936
View Profile

,
Zengzhong Li

Microsoft, Redmond, USA

Microsoft, Redmond, USA

0009-0002-8243-7769
View Profile

,
Rangan Majumder

Microsoft, Redmond, USA

Microsoft, Redmond, USA

0000-0003-2430-575X
View Profile

,
Jennifer Neville

Microsoft, Redmond, USA

Microsoft, Redmond, USA

0009-0007-1157-018X
View Profile

,
Andy Oakley

Microsoft, Redmond, USA

Microsoft, Redmond, USA

0009-0002-7208-2933
View Profile

,
Knut Magne Risvik

Microsoft, Oslo, Norway

Microsoft, Oslo, Norway

0009-0007-6503-8324
View Profile

,
Harsha Vardhan Simhadri

Microsoft, Redmond, USA

Microsoft, Redmond, USA

0000-0002-9323-2227
View Profile

,
Manik Varma

Microsoft, Bengaluru, India

Microsoft, Bengaluru, India

0000-0003-4516-6613
View Profile

,
Yujing Wang

Microsoft, Beijing, China

Microsoft, Beijing, China

0000-0002-7940-5216
View Profile

,
Linjun Yang

Microsoft, Redmond, USA

Microsoft, Redmond, USA

0009-0001-1778-7167
View Profile

,
Mao Yang

Microsoft, Beijing, China

Microsoft, Beijing, China

0009-0009-6455-3898
View Profile

,
Ce Zhang

ETH Zürich, Zürich, Switzerland

ETH Zürich, Zürich, Switzerland

0000-0002-8105-7505
View Profile

WWW '24: Companion Proceedings of the ACM on Web Conference 2024May 2024Pages 292–301https://doi.org/10.1145/3589335.3648327

Published:13 May 2024Publication History

WWW '24: Companion Proceedings of the ACM on Web Conference 2024

Pages 292–301

ABSTRACT

Recent breakthroughs in large models have highlighted the critical significance of data scale, labels and modals. In this paper, we introduce MS MARCO Web Search, the first large-scale information-rich web dataset, featuring millions of real clicked query-document labels. This dataset closely mimics real-world web document and query distribution, provides rich information for various kinds of downstream tasks and encourages research in various areas, such as generic end-to-end neural indexer models, generic embedding models, and next generation information access system with large language models. MS MARCO Web Search offers a retrieval benchmark with three web retrieval challenge tasks that demands innovations in both machine learning and information retrieval system research domains. As the first dataset that meets large, real and rich data requirements, MS MARCO Web Search paves the way for future advancements in AI and system research. MS MARCO Web Search dataset is available at: https://github.com/microsoft/MS-MARCO-Web-Search.

Supplemental Material

ip6166.mp4

Supplemental video

mp4

74.7 MB

Download

References

Virgilio Almeida, Azer Bestavros, Mark Crovella, and Adriana De Oliveira. 1996. Characterizing reference locality in the WWW. In Fourth International Conference on Parallel and Distributed Information Systems. IEEE, 92--103.Google ScholarCross Ref
er et al.(2017)]% aumuller2017ann, Martin Aumüller, Erik Bernhardsson, and Alexander Faithfull. 2017. ANN-benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. In International conference on similarity search and applications. Springer, 34--49.Google ScholarCross Ref
Artem Babenko and Victor Lempitsky. 2014. The inverted multi-index. IEEE transactions on pattern analysis and machine intelligence, Vol. 37, 6 (2014), 1247--1260.Google Scholar
Artem Babenko and Victor Lempitsky. 2016. Efficient indexing of billion-scale datasets of deep descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2055--2063.Google Scholar
Dmitry Baranchuk, Artem Babenko, and Yury Malkov. 2018. Revisiting the inverted indices for billion-scale approximate nearest neighbors. In Proceedings of the European Conference on Computer Vision (ECCV). 202--216.Google ScholarDigital Library
Michele Bevilacqua, Giuseppe Ottaviano, Patrick Lewis, Scott Yih, Sebastian Riedel, and Fabio Petroni. 2022. Autoregressive search engines: Generating substrings as document identifiers. Advances in Neural Information Processing Systems , Vol. 35 (2022), 31668--31683.Google Scholar
Jamie Callan. 2012. The lemur project and its clueweb12 dataset. In Invited talk at the SIGIR 2012 Workshop on Open-Source Information Retrieval.Google Scholar
Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, and Jingdong Wang. 2021. SPANN: Highly-efficient Billion-scale Approximate Nearest Neighborhood Search. Advances in Neural Information Processing Systems , Vol. 34 (2021), 5199--5212.Google Scholar
Charles Clarke, Nick Craswell, and Ian Soboroff. 2004. Overview of the TREC 2004 Terabyte Track. In TREC.Google Scholar
Charles LA Clarke, Nick Craswell, and Ian Soboroff. 2009. Overview of the TREC 2009 Web Track.. In Trec, Vol. 9. 20--29.Google Scholar
Nick Craswell, Daniel Campos, Bhaskar Mitra, Emine Yilmaz, and Bodo Billerbeck. 2020. ORCAS: 20 million clicked query-document pairs for analyzing search. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2983--2989.Google ScholarDigital Library
Zhuyun Dai and Jamie Callan. 2019. Context-aware sentence/passage term importance estimation for first stage retrieval. arXiv preprint arXiv:1910.10687 (2019).Google Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
Luyu Gao and Jamie Callan. 2022. Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2843--2853.Google ScholarCross Ref
Steven Glassman. 1994. A caching relay for the world wide web. Computer Networks and ISDN systems , Vol. 27, 2 (1994), 165--173.Google Scholar
Jiafeng Guo, Yinqiong Cai, Yixing Fan, Fei Sun, Ruqing Zhang, and Xueqi Cheng. 2022. Semantic models for the first-stage retrieval: A comprehensive review. ACM Transactions on Information Systems (TOIS), Vol. 40, 4 (2022), 1--42.Google ScholarDigital Library
Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar. 2020. Accelerating Large-Scale Inference with Anisotropic Vector Quantization. In Proceedings of the 37th International Conference on Machine Learning (ICML). 3887--3896.Google Scholar
Yelong Shen Jiancheng Lv Nan Duan Weizhu Chen Hang Zhang, Yeyun Gong. 2022. Adversarial Retriever-Ranker model for Dense Retrieval. In ICLR.Google Scholar
Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. 2014. Convolutional neural network architectures for matching natural language sentences. Advances in neural information processing systems , Vol. 27 (2014).Google Scholar
Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. 2333--2338.Google ScholarDigital Library
Samuel Humeau, Kurt Shuster, Marie-Anne Lachaux, and Jason Weston. 2019. Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring. In International Conference on Learning Representations.Google Scholar
Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vardhan Simhadri, Ravishankar Krishnawamy, and Rohan Kadekodi. 2019. Diskann: Fast accurate billion-point nearest neighbor search on a single node. Advances in Neural Information Processing Systems , Vol. 32 (2019).Google Scholar
Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2010. Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence, Vol. 33, 1 (2010), 117--128.Google Scholar
Hervé Jégou, Romain Tavenard, Matthijs Douze, and Laurent Amsaleg. 2011. Searching in one billion vectors: re-rank with source coding. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 861--864.Google ScholarCross Ref
Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data (2019).Google ScholarCross Ref
Yannis Kalantidis and Yannis Avrithis. 2014. Locally optimized product quantization for approximate nearest neighbor search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2321--2328.Google ScholarDigital Library
Vladimir Karpukhin, Barlas Oug uz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906 (2020).Google Scholar
Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, et al. 2019. Natural questions: a benchmark for question answering research. Transactions of the Association for Computational Linguistics , Vol. 7 (2019), 453--466.Google ScholarCross Ref
Carlos Lassance and Stéphane Clinchant. 2023. Naver Labs Europe (SPLADE)@ TREC Deep Learning 2022. arXiv preprint arXiv:2302.12574 (2023).Google Scholar
Patrick Lewis, Pontus Stenetorp, and Sebastian Riedel. 2021. Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 1000--1008.Google ScholarCross Ref
Wenhao Lu, Jian Jiao, and Ruofei Zhang. 2020. Twinbert: Distilling knowledge to twin-structured compressed BERT models for large-scale retrieval. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2645--2652.Google ScholarDigital Library
microsoft. [n.,d.] a. Bing search. https://www.bing.com/.Google Scholar
microsoft. [n.,d.] b. New Bing. https://www.bing.com/new.Google Scholar
Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, et al. 2021. Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332 (2021).Google Scholar
Arvind Neelakantan, Tao Xu, Raul Puri, Alec Radford, Jesse Michael Han, Jerry Tworek, Qiming Yuan, Nikolas Tezak, Jong Wook Kim, Chris Hallacy, et al. 2022. Text and code embeddings by contrastive pre-training. arXiv preprint arXiv:2201.10005 (2022).Google Scholar
Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A human generated machine reading comprehension dataset. In CoCo@ NIPS.Google Scholar
Arnold Overwijk, Chenyan Xiong, and Jamie Callan. 2022. ClueWeb22: 10 billion web documents with rich information. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 3360--3362.Google ScholarDigital Library
Hamid Palangi, Li Deng, Yelong Shen, Jianfeng Gao, Xiaodong He, Jianshu Chen, Xinying Song, and Rabab Ward. 2016. Deep sentence embedding using long short-term memory networks: Analysis and application to information retrieval. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 24, 4 (2016), 694--707.Google ScholarDigital Library
Yifan Qiao, Chenyan Xiong, Zhenghao Liu, and Zhiyuan Liu. 2019. Understanding the Behaviors of BERT in Ranking. arXiv preprint arXiv:1904.07531 (2019).Google Scholar
Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084 (2019).Google Scholar
Jie Ren, Minjia Zhang, and Dong Li. 2020. HM-ANN: Efficient Billion-Point Nearest Neighbor Search on Heterogeneous Memory. In In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vol. 33.Google Scholar
Stephen E Robertson and Steve Walker. 1994. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In SIGIR'94: Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, organised by Dublin City University. Springer, 232--241.Google ScholarCross Ref
Shota Sasaki, Shuo Sun, Shigehiko Schamoni, Kevin Duh, and Kentaro Inui. 2018. Cross-lingual learning-to-rank with shared representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 458--463.Google ScholarCross Ref
Minjoon Seo, Jinhyuk Lee, Tom Kwiatkowski, Ankur Parikh, Ali Farhadi, and Hannaneh Hajishirzi. 2019. Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 4430--4441.Google ScholarCross Ref
Xuan Shan, Chuanjie Liu, Yiqian Xia, Qi Chen, Yusi Zhang, Kaize Ding, Yaobo Liang, Angen Luo, and Yuxiang Luo. 2021. GLOW: Global Weighted Self-Attention Network for Web Search. In 2021 IEEE International Conference on Big Data (Big Data). IEEE, 519--528.Google Scholar
Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Grégoire Mesnil. 2014. Learning semantic representations using convolutional neural networks for web search. In Proceedings of the 23rd international conference on world wide web. 373--374.Google ScholarDigital Library
Ian Soboroff. 2021. Overview of TREC 2021. In 30th Text REtrieval Conference. Gaithersburg, Maryland.Google Scholar
Suhas Jayaram Subramanya, Rohan Kadekodi, Ravishankar Krishaswamy, and Harsha Vardhan Simhadri. 2019. Diskann: Fast accurate billion-point nearest neighbor search on a single node. In Proceedings of the 33rd International Conference on Neural Information Processing Systems. 13766--13776.Google Scholar
Yi Tay, Vinh Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, et al. 2022. Transformer memory as a differentiable search index. Advances in Neural Information Processing Systems , Vol. 35 (2022), 21831--21843.Google Scholar
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).Google Scholar
Yujing Wang, Yingyan Hou, Haonan Wang, Ziming Miao, Shibin Wu, Qi Chen, Yuqing Xia, Chengmin Chi, Guoshuai Zhao, Zheng Liu, et al. 2022. A neural corpus indexer for document retrieval. Advances in Neural Information Processing Systems , Vol. 35 (2022), 25600--25614.Google Scholar
Shitao Xiao, Zheng Liu, Weihao Han, Jianjin Zhang, Defu Lian, Yeyun Gong, Qi Chen, Fan Yang, Hao Sun, Yingxia Shao, et al. 2022. Distill-vq: Learning retrieval oriented vector quantization by distilling knowledge from dense embeddings. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1513--1523.Google ScholarDigital Library
Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul Bennett, Junaid Ahmed, and Arnold Overwijk. 2020. Approximate nearest neighbor negative contrastive learning for dense text retrieval. arXiv preprint arXiv:2007.00808 (2020).Google Scholar
Linlong Xu, Baosong Yang, Xiaoyu Lv, Tianchi Bi, Dayiheng Liu, and Haibo Zhang. 2021. Leveraging Advantages of Interactive and Non-Interactive Models for Vector-Based Cross-Lingual Information Retrieval. arXiv preprint arXiv:2111.01992 (2021).Google Scholar
Jingtao Zhan, Xiaohui Xie, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, and Shaoping Ma. 2022. Evaluating Interpolation and Extrapolation Performance of Neural Retrieval Models. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 2486--2496.Google ScholarDigital Library
Kun Zhou, Yeyun Gong, Xiao Liu, Wayne Xin Zhao, Yelong Shen, Anlei Dong, Jingwen Lu, Rangan Majumder, Ji-Rong Wen, and Nan Duan. 2022. SimANS: Simple Ambiguous Negatives Sampling for Dense Text Retrieval. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track. 548--559.Google Scholar
Shengyao Zhuang, Hang Li, and G. Zuccon. 2021. Deep Query Likelihood Model for Information Retrieval. In ECIR. ioGoogle Scholar

Index Terms

MS MARCO Web Search: A Large-scale Information-rich Web Dataset with Millions of Real Click Labels
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Relevance assessment

Recommendations

Implicit link analysis for small web search
SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval

Current Web search engines generally impose link analysis-based re-ranking on web-page retrieval. However, the same techniques, when applied directly to small web search such as intranet and site search, cannot achieve the same performance because their ...
Read More
Mining rich session context to improve web search
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

User browsing information, particularly their non-search related activity, reveals important contextual information on the preferences and the intent of web users. In this paper, we expand the use of browsing information for web search ranking and other ...
Read More
Exploiting site-level information to improve web search
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management

Ranking Web search results has long evolved beyond simple bag-of-words retrieval models. Modern search engines routinely employ machine learning ranking that relies on exogenous relevance signals. Yet the majority of current methods still evaluate each ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '24: Companion Proceedings of the ACM on Web Conference 2024
May 2024
1928 pages
ISBN:9798400701726
DOI:10.1145/3589335
General Chairs:
Tat-Seng Chua
National University of Singapore
,
Chong-Wah Ngo
Singapore Management University
,
Proceedings Chair:
Roy Ka-Wei Lee
Singapore University of Technology and Design
,
Program Chairs:
Ravi Kumar
Google
,
Hady W. Lauw
Singapore Management University
Copyright © 2024 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 May 2024
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
dataset
information retrieval
web search
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 229
  Total Downloads
- Downloads (Last 12 months)229
- Downloads (Last 6 weeks)229
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

MS MARCO Web Search: A Large-scale Information-rich Web Dataset with Millions of Real Click Labels

WWW '24: Companion Proceedings of the ACM on Web Conference 2024

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Implicit link analysis for small web search

Mining rich session context to improve web search

Exploiting site-level information to improve web search

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

MS MARCO Web Search: A Large-scale Information-rich Web Dataset with Millions of Real Click Labels

WWW '24: Companion Proceedings of the ACM on Web Conference 2024

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Implicit link analysis for small web search

Mining rich session context to improve web search

Exploiting site-level information to improve web search

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media