Skip to main content

A Topic Transition Map for Query Expansion: A Semantic Analysis of Click-Through Data and Test Collections

  • Conference paper
  • First Online:
  • 3077 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9992))

Abstract

Term mismatching between queries and documents has long been recognized as a key problem in information retrieval (IR). Based on our analysis of a large-scale web query log and relevant documents in standard test collections, we attempt to detect topic transitions between the topical categories of a query and those of relevant documents (or clicked pages) and create a Topic Transition Map (TTM) that captures how query topic categories are linked to those of relevant or clicked documents. TTM, a kind of click-graph at the semantic level, is then used for query expansion by suggesting the terms associated with the document categories strongly related to the query category. Unlike most other query expansion methods that attempt to either interpret the semantics of queries based on a thesaurus-like resource or use the content of a small number of relevant documents, our method proposes to retrieve documents in the semantic affinity of multiple categories of the documents relevant for the queries of a similar kind. Our experiments show that the proposed method is superior in effectiveness to other representative query expansion methods such as standard relevance feedback, pseudo relevance feedback, and thesaurus-based expansion of queries.

This is a preview of subscription content, log in via an institution.

Notes

  1. 1.

    Hereafter we use queries and relevant documents without losing generality as we use the text in actual pages corresponding to the clicked URLs in our analysis.

  2. 2.

    We used the PL2 weighting model Terrier provides. Although we’ve tested other weighing scheme, such as, BM25, DFR_BM25, tf-idf, the PL2 showed the best performance on MAP and P@n measure when tested with TREC-4.

  3. 3.

    The evaluteIR.org ALPHA web site provides useful information about IR test sets and systems for comparisons. Available at: https://web.archive.org/web/20150222083239/http://wice.csse.unimelb.edu.au:15000/evalweb/ireval/

References

  1. Baeza-Yates, R., Calderón-Benavides, L., González-Caro, C.: The intention behind web queries. In: Crestani, F., Ferragina, P., Sanderson, M. (eds.) SPIRE 2006. LNCS, vol. 4209, pp. 98–109. Springer, Heidelberg (2006). doi:10.1007/11880561_9

    Chapter  Google Scholar 

  2. Nettleton, D.F., Calderon-Benavides, L., Baeza-Yates, R.: Analysis of web search engine clicked document. In: Proceedings of LA-Web 2006, pp. 209–219 (2006)

    Google Scholar 

  3. Craswell, N., Szummer, M.: Random walks on the click graph. In: Proceedings of ACM SIGIR 2007, pp. 239–246 (2007)

    Google Scholar 

  4. Baeza-Yates, R., Tiberi, A.: Extraction semantic relations from query logs. In: Proceedings of SIGKDD 2007, pp. 76–85 (2007)

    Google Scholar 

  5. Baeza-Yates, R.A., et al.: The anatomy of a large query graph. J. Phys. A Math. Theor. 41, 1–13 (2008)

    Article  MathSciNet  Google Scholar 

  6. Li, X., Wang, Y.-Y., Acero, A.: Learning query intent from regularized click graph. In: Proceedings of ACM SIGIR 2008, pp. 339–346 (2008)

    Google Scholar 

  7. Cho, J., Roy, S.: Impact of search engines on page popularity. In: Proceedings of WWW 2004, pp. 20–29 (2004)

    Google Scholar 

  8. Broder, A.: A taxonomy of web search. ACM SIGIR Forum 36(2), 3–10 (2002)

    Article  MATH  Google Scholar 

  9. Raman, K., Bennett, P.N., Collins-Thompson, K.: Toward whole-session relevance: exploring intrinsic diversity in web search. In: Proceedings of SIGIR 2013, pp. 463–472 (2013)

    Google Scholar 

  10. Shen, D., Sun, J.-T., Yang, Q., Chen, Z.: Building bridges for web query classification. In: Proceedings of ACM SIGIR 2006, pp. 131–138 (2006)

    Google Scholar 

  11. Parikh, N., Sundaresan, N.: Inferring semantic query relations from collective user behavior. In: Proceedings of CIKM 2008, pp. 349–358 (2008)

    Google Scholar 

  12. Xue, G.-R., Xing, D., Yang, Q., Yu, Y.: Deep classification in large-scale text hierarchies. In: Proceedings of ACM SIGIR 2008, pp. 619–626 (2008)

    Google Scholar 

  13. Xing, D., Xue, G.-R., Yang, Q., Yu, Y.: Deep classifier: automatically categorizing search results into large-scale hierarchies. In: Proceedings of ACM WSDM 2008, pp. 139–148 (2008)

    Google Scholar 

  14. Wang, Q., et al.: Mining subtopics from text fragments for a web query. Inf. Retr. 16, 484–503 (2013)

    Article  Google Scholar 

  15. Fonseca, B.M., Golgher, P., Possas, B.: Concept-based interactive query expansion. In: Proceedings of CIKM 2005, pp. 696–703 (2005)

    Google Scholar 

  16. Chen, Y., Xue, G.-R., Yu, Y.: Advertising keyword suggestion based on concept hierarchy. In: Proceedings of ACM WSDM 2008, pp. 251–260 (2008)

    Google Scholar 

  17. Zhang, B., Du, Y., Li, H., Wang, Y.: Query expansion based on topics. In: Proceedings of FSDK 2008, pp 610–614 (2008)

    Google Scholar 

  18. Cui, H., Wen, J.-R., Nie, J.-Y., Ma, W.-Y.: Probabilistic query expansion using query logs. In: Proceedings of WWW 2002, pp. 325–332 (2002)

    Google Scholar 

  19. Ma, H., Yang, H., King, I., Lyu, M.R.: Learning latent semantic relations from click through data for query suggestion. In: Proceedings of ACM CIKM 2008, pp. 709–718 (2008)

    Google Scholar 

  20. Broder, A., Fontoura, M., Josifovsk, V., Riedel, L.: A semantic approach to contextual advertising. In: Proceedings of ACM SIGIR 2007, pp. 559–566 (2007)

    Google Scholar 

  21. Kaptein, R., Kamps, J.: Improving information access by relevance and topical feedback. In: Proceedings of the 2nd Workshop on Adaptive Information Retrieval (AIR 2008)

    Google Scholar 

  22. Rocchio, J.: Relevance feedback in information retrieval. In: The SMART Retrieval System: Experiments in Automatic Document Processing, pp. 313–323 (1971)

    Google Scholar 

  23. Fang, H., Zhai, C.: An exploration of axiomatic approaches to information retrieval. In: Proceedings of ACM SIGIR 2005, pp. 480–487 (2005)

    Google Scholar 

  24. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–45 (2000)

    Article  MathSciNet  Google Scholar 

  25. Zobel, J.: How reliable are the results of large-scale information retrieval experiments? In: Proceedings of ACM SIGIR 1998, pp. 307–314 (1998)

    Google Scholar 

  26. Shen, X., Dumais, S., Horvitz, E.: Analysis of topic dynamics in web search. In: Proceedings of WWW 2005, pp. 1102–1103 (2005)

    Google Scholar 

  27. Kawamae, N., Suzuki, H., Mizuno, O.: Query and content suggestion based on latent interest and topic class. In: Proceedings of WWW 2004, pp. 350–351 (2004)

    Google Scholar 

  28. Andre, P., Teevan, J., Dumais, S.T.: From X-Rays to silly putty via uranus: serendipity and its role in web search. In: Proceedings of ACM SIGCHI 2009, pp. 2233–2036 (2009)

    Google Scholar 

  29. Broder, A., Fontoura, M., et al.: Robust classification of rare queries using web knowledge. In: Proceedings of ACM SIGIR 2007, pp. 231–238 (2007)

    Google Scholar 

Download references

Acknowledgments

This work was partly supported by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government(MSIP) (No. R0101-15-0176, Development of Core Technology for Human-like Self-taught Learning based on a Symbolic Approach) and Industrial Strategic Technology Development Program grant funded by the Ministry of Trade, Industry & Energy (MI, Korea) (No. 10052955, Experiential Knowledge Platform Development Research for the Acquisition and Utilization of Field Expert Knowledge).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kyung-min Kim .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Kim, Km., Jung, Y., Myaeng, SH. (2016). A Topic Transition Map for Query Expansion: A Semantic Analysis of Click-Through Data and Test Collections. In: Kang, B.H., Bai, Q. (eds) AI 2016: Advances in Artificial Intelligence. AI 2016. Lecture Notes in Computer Science(), vol 9992. Springer, Cham. https://doi.org/10.1007/978-3-319-50127-7_57

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-50127-7_57

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-50126-0

  • Online ISBN: 978-3-319-50127-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics