ABSTRACT
Considering the multimodal signals of search items is beneficial for retrieval effectiveness. Especially in web table retrieval (WTR) experiments, accounting for multimodal properties of tables boosts effectiveness. However, it still remains an open question how the single modalities affect user experience in particular. Previous work analyzed WTR performance in ad-hoc retrieval benchmarks, which neglects interactive search behavior and limits the conclusion about the implications for real-world user environments.
To this end, this work presents an in-depth evaluation of simulated interactive WTR search sessions as a more cost-efficient and reproducible alternative to real user studies. As a first of its kind, we introduce interactive query reformulation strategies based on Doc2Query, incorporating cognitive states of simulated user knowledge. Our evaluations include two perspectives on user effectiveness by considering different cost paradigms, namely query-wise and time-oriented measures of effort. Our multi-perspective evaluation scheme reveals new insights about query strategies, the impact of modalities, and different user types in simulated WTR search sessions.
- Leif Azzopardi, Maarten de Rijke, and Krisztian Balog. 2007. Building Simulated Queries for Known-Item Topics: An Analysis Using Six European Languages. In SIGIR 2007: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, the Netherlands, July 23--27, 2007, , Wessel Kraaij, Arjen P. de Vries, Charles L. A. Clarke, Norbert Fuhr, and Noriko Kando (Eds.). ACM, 455--462. https://doi.org/10.1145/1277741.1277820 https://doi.org/10.1145/1277741.1277820.Google ScholarDigital Library
- Peter Bailey, Alistair Moffat, Falk Scholer, and Paul Thomas. 2015. User Variability and IR System Evaluation. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Chile, August 9--13, 2015, Ricardo Baeza-Yates, Mounia Lalmas, Alistair Moffat, and Berthier A. Ribeiro-Neto (Eds.). ACM, 625--634. https://doi.org/10.1145/2766462.2767728 https://doi.org/10.1145/2766462.2767728.Google ScholarDigital Library
- Krisztian Balog, David Maxwell, Paul Thomas, and Shuo Zhang. 2021. Report on the 1st Simulation for Information Retrieval Workshop (Sim4IR 2021) at SIGIR 2021. SIGIR Forum, Vol. 55, 2 (2021), 10:1--10:16.Google Scholar
- Krisztian Balog, David Maxwell, Paul Thomas, and Shuo Zhang. 2022. Report on the 1st Simulation for Information Retrieval Workshop (Sim4IR 2021) at SIGIR 2021. SIGIR Forum, Vol. 55, 2, Article 10 (mar 2022), 16 pages. https://doi.org/10.1145/3527546.3527559Google ScholarDigital Library
- Feza Baskaya, Heikki Keskustalo, and Kalervo J"arvelin. 2013. Modeling Behavioral Factors in Interactive Information Retrieval. In 22nd ACM International Conference on Information and Knowledge Management, CIKM '13, San Francisco, CA, USA, October 27 - November 1, 2013, , Qi He, Arun Iyengar, Wolfgang Nejdl, Jian Pei, and Rajeev Rastogi (Eds.). ACM, 2297--2302. https://doi.org/10.1145/2505515.2505660 https://doi.org/10.1145/2505515.2505660.Google ScholarDigital Library
- Ben Carterette, Ashraf Bah, and Mustafa Zengin. 2015a. Dynamic Test Collections for Retrieval Evaluation. In Proceedings of the 2015 International Conference on The Theory of Information Retrieval (Northampton, Massachusetts, USA) (ICTIR '15). Association for Computing Machinery, New York, NY, USA, 91--100. https://doi.org/10.1145/2808194.2809470Google ScholarDigital Library
- Ben Carterette, Ashraf Bah, and Mustafa Zengin. 2015b. Dynamic Test Collections for Retrieval Evaluation. In Proceedings of the 2015 International Conference on the Theory of Information Retrieval, ICTIR 2015, Northampton, Massachusetts, USA, September 27--30, 2015, , James Allan, W. Bruce Croft, Arjen P. de Vries, and Chengxiang Zhai (Eds.). ACM, 91--100. https://doi.org/10.1145/2808194.2809470 https://doi.org/10.1145/2808194.2809470.Google ScholarDigital Library
- Zhiyu Chen, Shuo Zhang, and Brian D. Davison. 2021. WTR : A Test Collection for Web Table Retrieval. In SIGIR. ACM, 2514--2520.Google Scholar
- Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke. 2015. Click Models for Web Search. Morgan & Claypool Publishers. https://doi.org/10.2200/S00654ED1V01Y201507ICR043 https://doi.org/10.2200/S00654ED1V01Y201507ICR043.Google ScholarCross Ref
- Sebastian Günther and Matthias Hagen. 2021. Assessing Query Suggestions for Search Session Simulation. Sim4IR: The SIGIR 2021 Workshop on Simulation for Information Retrieval Evaluation (2021). http://ceur-ws.org/Vol-2911/paper6.pdf.Google Scholar
- Katja Hofmann, Anne Schuth, Shimon Whiteson, and Maarten de Rijke. 2013. Reusing Historical Interaction Data for Faster Online Learning to Rank for IR. In Sixth ACM International Conference on Web Search and Data Mining, WSDM 2013, Rome, Italy, February 4--8, 2013, , Stefano Leonardi, Alessandro Panconesi, Paolo Ferragina, and Aristides Gionis (Eds.). ACM, 183--192. https://doi.org/10.1145/2433396.2433419 https://doi.org/10.1145/2433396.2433419.Google ScholarDigital Library
- Kalervo J"arvelin, Susan L. Price, Lois M. L. Delcambre, and Marianne Lykke Nielsen. 2008. Discounted Cumulated Gain Based Evaluation of Multiple-Query IR Sessions. In Advances in Information Retrieval, 30th European Conference on IR Research, ECIR 2008, Glasgow, UK, March 30-April 3, 2008. Proceedings (Lecture Notes in Computer Science, Vol. 4956), Craig Macdonald, Iadh Ounis, Vassilis Plachouras, Ian Ruthven, and Ryen W. White (Eds.). Springer, 4--15. https://doi.org/10.1007/978--3--540--78646--7_4 https://doi.org/10.1007/978--3--540--78646--7_4.Google Scholar
- Chris Jordan, Carolyn R. Watters, and Qigang Gao. 2006. Using Controlled Query Generation to Evaluate Blind Relevance Feedback Algorithms. In JCDL. ACM, 286--295.Google Scholar
- Craig Macdonald, Nicola Tonellotto, Sean MacAvaney, and Iadh Ounis. 2021. PyTerrier : Declarative Experimentation in Python from BM25 to Dense Retrieval. In CIKM. ACM, 4526--4533.Google Scholar
- David Maxwell and Leif Azzopardi. 2016a. Agents, Simulated Users and Humans : An Analysis of Performance and Behaviour. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM '16). Association for Computing Machinery, New York, NY, USA, 731--740. https://doi.org/10.1145/2983323.2983805 https://dl.acm.org/doi/10.1145/2983323.2983805.Google ScholarDigital Library
- David Maxwell and Leif Azzopardi. 2016b. Simulating Interactive Information Retrieval: SimIIR : A Framework for the Simulation of Interaction. In SIGIR. ACM, 1141--1144.Google Scholar
- David Maxwell and Leif Azzopardi. 2018. Information Scent, Searching and Stopping - Modelling SERP Level Stopping Behaviour. In Advances in Information Retrieval - 40th European Conference on IR Research, ECIR 2018, Grenoble, France, March 26--29, 2018, Proceedings (Lecture Notes in Computer Science, Vol. 10772), , Gabriella Pasi, Benjamin Piwowarski, Leif Azzopardi, and Allan Hanbury (Eds.). Springer, 210--222. https://doi.org/10.1007/978--3--319--76941--7_16 https://doi.org/10.1007/978--3--319--76941--7_16.Google Scholar
- Rodrigo Frassetto Nogueira, Wei Yang, Jimmy Lin, and Kyunghyun Cho. 2019. Document Expansion by Query Prediction. CoRR , Vol. abs/1904.08375 (2019).Google Scholar
- OpenAI. 2023. Model Index for Researchers. https://platform.openai.com/docs/model-index-for-researchers.Google Scholar
- Teemu P"a"akkönen, Jaana Kek"al"ainen, Heikki Keskustalo, Leif Azzopardi, David Maxwell, and Kalervo J"arvelin. 2017. Validating Simulated Interaction for Retrieval Evaluation. , Vol. 20, 4 (2017), 338--362. https://doi.org/10.1007/s10791-017--9301--2 https://doi.org/10.1007/s10791-017--9301--2.Google Scholar
- Gustavo Penha, Arthur Câmara, and Claudia Hauff. 2022. Evaluating the Robustness of Retrieval Pipelines with Query Variation Generators. In ECIR (1) (Lecture Notes in Computer Science, Vol. 13185). Springer, 397--412.Google Scholar
- Roee Shraga, Haggai Roitman, Guy Feigenblat, and Mustafa Canim. 2020. Web Table Retrieval Using Multimodal Deep Learning. In SIGIR. ACM, 1399--1408.Google Scholar
- Mark D. Smucker and Charles L. A. Clarke. 2012. Time-Based Calibration of Effectiveness Measures. In The 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '12, Portland, OR, USA, August 12--16, 2012, , William R. Hersh, Jamie Callan, Yoelle Maarek, and Mark Sanderson (Eds.). ACM, 95--104. https://doi.org/10.1145/2348283.2348300 https://doi.org/10.1145/2348283.2348300.Google ScholarDigital Library
- Paul Thomas, Alistair Moffat, Peter Bailey, and Falk Scholer. 2014. Modeling Decision Points in User Search Behavior. In Fifth Information Interaction in Context Symposium, IIiX '14, Regensburg, Germany, August 26--29, 2014, , David Elsweiler, Bernd Ludwig, Leif Azzopardi, and Max L. Wilson (Eds.). ACM, 239--242. https://doi.org/10.1145/2637002.2637032 https://doi.org/10.1145/2637002.2637032.Google ScholarDigital Library
- Mohamed Trabelsi, Zhiyu Chen, Shuo Zhang, Brian D. Davison, and Jeff Heflin. 2022. StruBERT : Structure-aware BERT for Table Search and Matching. In Proceedings of the ACM Web Conference 2022. 442--451. https://doi.org/10.1145/3485447.3511972 arxiv: 2203.14278 [cs] http://arxiv.org/abs/2203.14278.Google ScholarDigital Library
- Hong Wang, Anqi Liu, Jing Wang, Brian D. Ziebart, Clement T. Yu, and Warren Shen. 2015. Context Retrieval for Web Tables. In ICTIR. ACM, 251--260.Google Scholar
- Zhao Yan, Duyu Tang, Nan Duan, Junwei Bao, Yuanhua Lv, Ming Zhou, and Zhoujun Li. 2017. Content-Based Table Retrieval for Web Queries. http://arxiv.org/abs/1706.02427. arxiv: 1706.02427 [cs]Google Scholar
- Saber Zerhoudi, Sebastian Günther, Kim Plassmeier, Timo Borst, Christin Seifert, Matthias Hagen, and Michael Granitzer. 2022. The SimIIR 2.0 Framework: User Types, Markov Model-Based Interaction Simulation, and Advanced Query Generation. In CIKM. ACM, 4661--4666.Google Scholar
- Shuo Zhang and Krisztian Balog. 2018. Ad Hoc Table Retrieval Using Semantic Similarity. In Proceedings of the 2018 World Wide Web Conference on World Wide Web - WWW '19. 1553--1562. https://doi.org/10.1145/3178876.3186067 arxiv: 1802.06159 [cs] http://arxiv.org/abs/1802.06159.Google ScholarDigital Library
- Shuo Zhang and Krisztian Balog. 2019. Web Table Extraction, Retrieval and Augmentation. In SIGIR. ACM, 1409--1410.Google Scholar
- Yinan Zhang, Xueqing Liu, and ChengXiang Zhai. 2017a. Information Retrieval Evaluation as Search Simulation: A General Formal Framework for IR Evaluation. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval (Amsterdam, The Netherlands) (ICTIR '17). Association for Computing Machinery, New York, NY, USA, 193--200. https://doi.org/10.1145/3121050.3121070Google ScholarDigital Library
- Yinan Zhang, Xueqing Liu, and ChengXiang Zhai. 2017b. Information Retrieval Evaluation as Search Simulation: A General Formal Framework for IR Evaluation. In ICTIR. ACM, 193--200.Google Scholar
- Justin Zobel. 2022. When Measurement Misleads : The Limits of Batch Assessment of Retrieval Systems. ACM SIGIR Forum, Vol. 56, 1 (2022), 20.Google ScholarDigital Library
Index Terms
- Simulating Users in Interactive Web Table Retrieval
Recommendations
Context-Driven Interactive Query Simulations Based on Generative Large Language Models
Advances in Information RetrievalAbstractSimulating user interactions enables a more user-oriented evaluation of information retrieval (IR) systems. While user simulations are cost-efficient and reproducible, many approaches often lack fidelity regarding real user behavior. Most notably,...
EmoUS: Simulating User Emotions in Task-Oriented Dialogues
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information RetrievalExisting user simulators (USs) for task-oriented dialogue systems only model user behaviour on semantic and natural language levels without considering the user persona and emotions. Optimising dialogue systems with generic user policies, which cannot ...
Interactive relevance feedback with graded relevance and sentence extraction: simulated user experiments
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementResearch on relevance feedback (RFB) in information retrieval (IR) has given mixed results. Success in RFB seems to depend on the searcher's willingness to provide feedback and ability to identify relevant documents or query keys. The paper is based on ...
Comments