Abstract
Term mismatching between queries and documents has long been recognized as a key problem in information retrieval (IR). Based on our analysis of a large-scale web query log and relevant documents in standard test collections, we attempt to detect topic transitions between the topical categories of a query and those of relevant documents (or clicked pages) and create a Topic Transition Map (TTM) that captures how query topic categories are linked to those of relevant or clicked documents. TTM, a kind of click-graph at the semantic level, is then used for query expansion by suggesting the terms associated with the document categories strongly related to the query category. Unlike most other query expansion methods that attempt to either interpret the semantics of queries based on a thesaurus-like resource or use the content of a small number of relevant documents, our method proposes to retrieve documents in the semantic affinity of multiple categories of the documents relevant for the queries of a similar kind. Our experiments show that the proposed method is superior in effectiveness to other representative query expansion methods such as standard relevance feedback, pseudo relevance feedback, and thesaurus-based expansion of queries.
This is a preview of subscription content, log in via an institution.
Notes
- 1.
Hereafter we use queries and relevant documents without losing generality as we use the text in actual pages corresponding to the clicked URLs in our analysis.
- 2.
We used the PL2 weighting model Terrier provides. Although we’ve tested other weighing scheme, such as, BM25, DFR_BM25, tf-idf, the PL2 showed the best performance on MAP and P@n measure when tested with TREC-4.
- 3.
The evaluteIR.org ALPHA web site provides useful information about IR test sets and systems for comparisons. Available at: https://web.archive.org/web/20150222083239/http://wice.csse.unimelb.edu.au:15000/evalweb/ireval/
References
Baeza-Yates, R., Calderón-Benavides, L., González-Caro, C.: The intention behind web queries. In: Crestani, F., Ferragina, P., Sanderson, M. (eds.) SPIRE 2006. LNCS, vol. 4209, pp. 98–109. Springer, Heidelberg (2006). doi:10.1007/11880561_9
Nettleton, D.F., Calderon-Benavides, L., Baeza-Yates, R.: Analysis of web search engine clicked document. In: Proceedings of LA-Web 2006, pp. 209–219 (2006)
Craswell, N., Szummer, M.: Random walks on the click graph. In: Proceedings of ACM SIGIR 2007, pp. 239–246 (2007)
Baeza-Yates, R., Tiberi, A.: Extraction semantic relations from query logs. In: Proceedings of SIGKDD 2007, pp. 76–85 (2007)
Baeza-Yates, R.A., et al.: The anatomy of a large query graph. J. Phys. A Math. Theor. 41, 1–13 (2008)
Li, X., Wang, Y.-Y., Acero, A.: Learning query intent from regularized click graph. In: Proceedings of ACM SIGIR 2008, pp. 339–346 (2008)
Cho, J., Roy, S.: Impact of search engines on page popularity. In: Proceedings of WWW 2004, pp. 20–29 (2004)
Broder, A.: A taxonomy of web search. ACM SIGIR Forum 36(2), 3–10 (2002)
Raman, K., Bennett, P.N., Collins-Thompson, K.: Toward whole-session relevance: exploring intrinsic diversity in web search. In: Proceedings of SIGIR 2013, pp. 463–472 (2013)
Shen, D., Sun, J.-T., Yang, Q., Chen, Z.: Building bridges for web query classification. In: Proceedings of ACM SIGIR 2006, pp. 131–138 (2006)
Parikh, N., Sundaresan, N.: Inferring semantic query relations from collective user behavior. In: Proceedings of CIKM 2008, pp. 349–358 (2008)
Xue, G.-R., Xing, D., Yang, Q., Yu, Y.: Deep classification in large-scale text hierarchies. In: Proceedings of ACM SIGIR 2008, pp. 619–626 (2008)
Xing, D., Xue, G.-R., Yang, Q., Yu, Y.: Deep classifier: automatically categorizing search results into large-scale hierarchies. In: Proceedings of ACM WSDM 2008, pp. 139–148 (2008)
Wang, Q., et al.: Mining subtopics from text fragments for a web query. Inf. Retr. 16, 484–503 (2013)
Fonseca, B.M., Golgher, P., Possas, B.: Concept-based interactive query expansion. In: Proceedings of CIKM 2005, pp. 696–703 (2005)
Chen, Y., Xue, G.-R., Yu, Y.: Advertising keyword suggestion based on concept hierarchy. In: Proceedings of ACM WSDM 2008, pp. 251–260 (2008)
Zhang, B., Du, Y., Li, H., Wang, Y.: Query expansion based on topics. In: Proceedings of FSDK 2008, pp 610–614 (2008)
Cui, H., Wen, J.-R., Nie, J.-Y., Ma, W.-Y.: Probabilistic query expansion using query logs. In: Proceedings of WWW 2002, pp. 325–332 (2002)
Ma, H., Yang, H., King, I., Lyu, M.R.: Learning latent semantic relations from click through data for query suggestion. In: Proceedings of ACM CIKM 2008, pp. 709–718 (2008)
Broder, A., Fontoura, M., Josifovsk, V., Riedel, L.: A semantic approach to contextual advertising. In: Proceedings of ACM SIGIR 2007, pp. 559–566 (2007)
Kaptein, R., Kamps, J.: Improving information access by relevance and topical feedback. In: Proceedings of the 2nd Workshop on Adaptive Information Retrieval (AIR 2008)
Rocchio, J.: Relevance feedback in information retrieval. In: The SMART Retrieval System: Experiments in Automatic Document Processing, pp. 313–323 (1971)
Fang, H., Zhai, C.: An exploration of axiomatic approaches to information retrieval. In: Proceedings of ACM SIGIR 2005, pp. 480–487 (2005)
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–45 (2000)
Zobel, J.: How reliable are the results of large-scale information retrieval experiments? In: Proceedings of ACM SIGIR 1998, pp. 307–314 (1998)
Shen, X., Dumais, S., Horvitz, E.: Analysis of topic dynamics in web search. In: Proceedings of WWW 2005, pp. 1102–1103 (2005)
Kawamae, N., Suzuki, H., Mizuno, O.: Query and content suggestion based on latent interest and topic class. In: Proceedings of WWW 2004, pp. 350–351 (2004)
Andre, P., Teevan, J., Dumais, S.T.: From X-Rays to silly putty via uranus: serendipity and its role in web search. In: Proceedings of ACM SIGCHI 2009, pp. 2233–2036 (2009)
Broder, A., Fontoura, M., et al.: Robust classification of rare queries using web knowledge. In: Proceedings of ACM SIGIR 2007, pp. 231–238 (2007)
Acknowledgments
This work was partly supported by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government(MSIP) (No. R0101-15-0176, Development of Core Technology for Human-like Self-taught Learning based on a Symbolic Approach) and Industrial Strategic Technology Development Program grant funded by the Ministry of Trade, Industry & Energy (MI, Korea) (No. 10052955, Experiential Knowledge Platform Development Research for the Acquisition and Utilization of Field Expert Knowledge).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Kim, Km., Jung, Y., Myaeng, SH. (2016). A Topic Transition Map for Query Expansion: A Semantic Analysis of Click-Through Data and Test Collections. In: Kang, B.H., Bai, Q. (eds) AI 2016: Advances in Artificial Intelligence. AI 2016. Lecture Notes in Computer Science(), vol 9992. Springer, Cham. https://doi.org/10.1007/978-3-319-50127-7_57
Download citation
DOI: https://doi.org/10.1007/978-3-319-50127-7_57
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50126-0
Online ISBN: 978-3-319-50127-7
eBook Packages: Computer ScienceComputer Science (R0)