Word sense disambiguation to improve precision for ambiguous queries

Adrian-Gabriel Chifu; Radu-Tudor Ionescu

doi:10.2478/s13537-012-0032-6

Open Access Published by De Gruyter Open Access December 28, 2012

Word sense disambiguation to improve precision for ambiguous queries

Adrian-Gabriel Chifu and Radu-Tudor Ionescu

From the journal Open Computer Science

https://doi.org/10.2478/s13537-012-0032-6

Abstract

Success in Information Retrieval (IR) depends on many variables. Several interdisciplinary approaches try to improve the quality of the results obtained by an IR system. In this paper we propose a new way of using word sense disambiguation (WSD) in IR. The method we develop is based on Naïve Bayes classification and can be used both as a filtering and as a re-ranking technique. We show on the TREC ad-hoc collection that WSD is useful in the case of queries which are difficult due to sense ambiguity. Our interest regards improving the precision after 5, 10 and 30 retrieved documents (P@5, P@10, P@30), respectively, for such lowest precision queries.

Keywords: information retrieval; word sense disambiguation; naïve bayes classification; difficult queries; ambiguous queries; document clustering; fusion functions

[1] Baccini A., Déjean S., Lafage L., Mothe J., How many performance measures to evaluate Information Retrieval Systems?, Knowl. Inf. Syst., 30, 693–713, 2012 http://dx.doi.org/10.1007/s10115-011-0391-710.1007/s10115-011-0391-7Search in Google Scholar

[2] Banerjee S., Pedersen T., Extended Gloss Overlaps as a Measure of Semantic Relatedness, In: Proceedings of the 18th International Joint Conference on Artificial Intelligence (Stockholm Sweden), 805–810, 2003 Search in Google Scholar

[3] Bigot A., Chrisment C., Dkaki T., Hubert G., Mothe J., Fusing different information retrieval systems according to query-topics: a study based on correlation in information retrieval systems and TREC topics, Inf. Retr., 14, 314–648, 2011 http://dx.doi.org/10.1007/s10791-011-9169-510.1007/s10791-011-9169-5Search in Google Scholar

[4] Cronen-Townsend S., Zhou Y., Croft W.B., Predicting Query Performance, In: Proceedings of the 25th annual international ACM-SIGIR conference on research and development in information retrieval, (New-York USA), ACM Press, 299–306, 2002 10.1145/564376.564429Search in Google Scholar

[5] Dempster A., Laird N., Rubin D., Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. B, 39, 1–38, 1977 10.1111/j.2517-6161.1977.tb01600.xSearch in Google Scholar

[6] Fellbaum C., (Eds.), WordNet: an Electronic Lexical Database (The MIT Press, Cambridge, MA, 1998) 10.7551/mitpress/7287.001.0001Search in Google Scholar

[7] Gale W., Church K., Yarowsky D., A method for disambiguating word senses in a large corpus, Computers and the Humanities, 26, 415–439, 1992 http://dx.doi.org/10.1007/BF0013698410.1007/BF00136984Search in Google Scholar

[8] Guyot J., Falquet G., Radhouani S., Benzineb K., Analysis of word sense disambiguation-based information retrieval, In: Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access, (Aarhus Denmark), 146–154, 2008 10.1007/978-3-642-04447-2_18Search in Google Scholar

[9] Harman D., Buckley C., Overview of the Reliable Information Access Workshop, Inf. Retr., 12, 615–641, 2009 http://dx.doi.org/10.1007/s10791-009-9101-410.1007/s10791-009-9101-4Search in Google Scholar

[10] Hristea F., Popescu M., Dumitrescu M., Performing word sense disambiguation at the border between unsupervised and knowledge-based techniques, Artif. Intell. Rev., 30, 67–86, 2008 http://dx.doi.org/10.1007/s10462-009-9117-610.1007/s10462-009-9117-6Search in Google Scholar

[11] Hristea F., Recent Advances Concerning the Usage of the Naïve Bayes Model in Unsupervised Word Sense Disambiguation, Int. Rev. Comput. Software, 4, 58–67, 2009 Search in Google Scholar

[12] Hristea F., Popescu, M., Adjective Sense Disambiguation at the Border Between Unsupervised and Knowledge-Based Techniques, Fundamenta Informaticae, 91, 547–562, 2009 10.3233/FI-2009-0057Search in Google Scholar

[13] Krovetz R., Croft W.B., Lexical ambiguity and information retrieval, ACM TOIS, 10, 115–141, 1992 http://dx.doi.org/10.1145/146802.14681010.1145/146802.146810Search in Google Scholar

[14] Mandl T., Womser-Hacker C., Linguistic and Statistical Analysis of the CLEF Topics, CLEF Workshop (Rome Italy), Springer, 505–511, 2002 10.1007/978-3-540-45237-9_43Search in Google Scholar

[15] Manning C., Schütze H., Foundations of Statistical Natural Language Processing, Cambridge (MA: The MIT Press, 2003) Search in Google Scholar

[16] Miller G.A., Nouns in WordNet: a lexical inheritance system, Int. J. Lexicography, 3, 245–264, 1990 http://dx.doi.org/10.1093/ijl/3.4.24510.1093/ijl/3.4.245Search in Google Scholar

[17] Mothe J., Tanguy L., Linguistic features to predict query difficulty — a case study on previous TREC campaigns, In: SIGIR, Predicting query difficulty — methods and applications workshop (Salvador Bahia Brazil), 7–10, 2005 Search in Google Scholar

[18] Mothe J., Tanguy L., Linguistic Analysis of Users’ Queries: towards an adaptive Information Retrieval System, In: International Conference on Signal-image technology & Internet-Based System (Shanghai China), 77–84, 2007 10.1109/SITIS.2007.81Search in Google Scholar

[19] Pedersen T., Bruce R., Knowledge Lean Word-Sense Disambiguation, In: Proceedings of the 15th National Conference on Artificial Intelligence, AAAI Press, 800–805, 1998 Search in Google Scholar

[20] Porter M.F., An algorithm for suffix stripping, Program, 14, 130–137, 1980 http://dx.doi.org/10.1108/eb04681410.1108/eb046814Search in Google Scholar

[21] Sanderson M., Word Sense Disambiguation and Information Retrieval, In: Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval (Dublin, Ireland), Springer Verlag, 142–151, 1994 10.1007/978-1-4471-2099-5_15Search in Google Scholar

[22] Shaw J.A., Fox E.A., Combination of Multiple Searches, Overview of the Third Text Retrieval Conference (TREC-3), NIST — Gaithersburg, 105–108, 1995 Search in Google Scholar

[23] Voorhees E.M., Using WordNet to disambiguate word senses for text retrieval, In: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, (Pittsburgh USA), ACM New York, 171–180, 1993 http://dx.doi.org/10.1145/160688.16071510.1145/160688.160715Search in Google Scholar

[24] Voorhees E.M., Harman D., Overview of the Seventh Text REtrieval Conference (TREC-7), NIST — Gaithersburg, 1998 10.6028/NIST.SP.500-242Search in Google Scholar

Published Online: 2012-12-28

Published in Print: 2012-12-1

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Word sense disambiguation to improve precision for ambiguous queries

Abstract

Journal and Issue

Articles in the same Issue