ABSTRACT
Semantic similarity between words or phrases is frequently used to find matching correlations between search queries and documents when straightforward matching of terms fails. This is particularly important for searching in visual databases, where pictures or video clips have been automatically tagged with a small set of semantic concepts based on analysis and classification of the visual content. Here, the textual description of documents is very limited, and semantic similarity based on WordNet's cognitive synonym structure, along with information content derived from term frequencies, can help to bridge the gap between an arbitrary textual query and a limited vocabulary of visual concepts. This approach, termed concept-based retrieval, has received significant attention over the last few years, and its success is highly dependent on the quality of the similarity measure used to map textual query terms to visual concepts.
In this paper, we consider some issues of semantic similarity measures based on Information Content (IC), and propose a way to improve them. In particular, we note that most IC-based similarity measures are derived from a small and relatively outdated corpus (the Brown corpus), which does not adequately capture the usage pattern of many contemporary terms: for example, out of more than 150,000 WordNet terms, only about 36,000 are represented. This shortcoming reflects very negatively on the coverage of typical search query terms. We therefore suggest using alternative IC corpora that are larger and better aligned with the usage of modern vocabulary. We experimentally derive two such corpora using the WWW Google search engine, and show that they provide better coverage of vocabulary, while showing comparable frequencies for Brown corpus terms. Finally, we evaluate the two proposed IC corpora in the context of a concept-based video retrieval application using the TRECVID 2005, 2006, and 2007 datasets, and we show that they increase average precision results by up to 200%.
- Fellbaum, C. WordNet: An Electronic Lexical Database. 1998. MIT Press, Cambridge, MA.Google Scholar
- Zhai, Y., Liu, J., Shah, M. Automatic Query Expansion for News Video Retrieval. In Proceedings of the International Conference on Multimedia and Expo (Toronto, Canada, July 9-12, 2006). ICME '06. IEEE Press, New York, NY, 965--968.Google Scholar
- Snoek, C.G.M., Huurnink, B., Hollink, L., de Rijke, M., Schreiber, G., Worring, M. Adding Semantics to Detectors for Video Retrieval, IEEE Transactions on Multimedia, Vol. 9, Issue 5 (August 2007). IEEE Press, New York, NY, 975--986. Google ScholarDigital Library
- Wu. Z., Palmer, M. Verb semantics and lexical selection. In Proceedings of Annual Meeting of the Association for Computational Linguistics (Las Cruces, NM, June 27-30, 1994). Morgan Kaufmann, San Francisco, CA, 133--138. Google ScholarDigital Library
- Resnik, P. Using Information Content to Evaluate Semantic Similarity in a Taxonomy. In Proceedings of the International Joint Conference on Artificial Intelligence (Montréal, Canada, August 20-25, 1995). IJCAI '95. Morgan Kaufmann, San Francsico, CA, 448--453. Google ScholarDigital Library
- Jiang, J.J., Conrath, D.W. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. In Proceedings of the International Conference Research on Computational Linguistics (Taipei, Taiwan, August 22-24, 1997). ROCLING X. 1997.Google Scholar
- Lin, D. An information-theoretic definition of similarity. In Proceedings of the International Conference on Machine Learning (Madison, WI, 1998). ICML '98. Morgan Kaufmann, San Francisco, CA, 296--304. Google ScholarDigital Library
- Lesk, M.E. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In Proceedings of the Special Interest Group Design of Communication Conference (Toronto, Canada, June 8-11). SIGDOC '86. ACM Press, New York, NY, 24--26. Google ScholarDigital Library
- Leacock, C., Chodorow, M., Miller, G.A. Using corpus statistics and WordNet relations for sense identification. In Computational Linguistics, Vol. 24, Number 1 (March 1998). MIT Press, Cambridge, MA, 147--165. Google ScholarDigital Library
- Over, P. Ianeva, T., Kraaij, W., Smeaton, A.F. TRECVID 2005 An Overview. In Proceedings of the NIST TRECVID 2005 Workshop (Gaithersburg, MD, November 14-15, 2005). TRECVID '05.Google Scholar
- Over P., Ianeva, T., Kraaij, W., Smeaton, A.F. TRECVID 2006 Overview. In Proceedings of the NIST TRECVID 2006 Workshop (Gaithersburg, MD, November 13-14, 2006). TRECVID '06.Google Scholar
- Over, P. Awad, G. Kraaij, W., Smeaton, A.F. TRECVID 2007 - An Introduction. In Proceedings of the NIST TRECVID 2007 Workshop (Gaithersburg, MD, November 5-6, 2007). TRECVID '07.Google Scholar
- Pedersen, T., Patwardhan, Michelizzi, J. Wordnet::similarity - measuring the relatedness of concepts. In Proceedings of the Annual Meeting of the North American Chapter of the Association for Computational Linguistics (Boston, MA, May 3-5, 2004). NAACL '04. Association for Computational Linguistics, Morristown, NJ, 38--41. Google ScholarDigital Library
- Patwardhan, S., Banerjee, S., Pedersen, T. Using Measures of Semantic Relatedness for Word Sense Disambiguation. In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics (Mexico City, Mexico, February 16-22, 2003). CICLing '03. Springer Verlag, Berlin, Heidelberg, 241--257. Google ScholarDigital Library
- Seco, N., Veale, T., Hayes, J. An Intrinsic Information Content Metric for Semantic Similarity in WordNet. In Proceedings of the European Conference on Artificial Intelligence (Valencia, Spain, August 22-27, 2004). ECAI '04. IOS Press, Amsterdam, The Netherlands, 1089--1090.Google Scholar
- Budanitsky, A., Hirst, G. Semantic distance in WordNet: An experimental, application--oriented evaluation of five measures. In Proceedings of the North American Chapter of the Association for Computational Linguistics Workshop (Pittsburgh, PA, June 2-7, 2001). NAACL '01. Association for Computational Linguistics, Morristown, NJ, 29--34.Google Scholar
- Pucher, M. Performance Evaluation of WordNet-based Semantic Relatedness Measures for Word Prediction in Conversational Speech. In Proceedings of the International Workshop on Computational Semantics (Tilburg, Netherlands, January 12-14, 2005). IWCS 6.Google Scholar
- Pedersen, T., Pakhomov, S. Developing Measures of Semantic Relatedness for the Biomedical Domain. Digital Technology Initiatives Forum (Minneapolis, MN, Feb 28, 2005). Digital Technology Center, University of Minnesota.Google Scholar
- Naphade, M., Smith, J.R., Souvannavong, F. On the Detection of Semantic Concepts at TRECVID. In Proceedings of the ACM Internation Multimedia Conference (New York, NY, October 10-16, 2004). ACM Press, New York, NY, 660--667. Google ScholarDigital Library
- Natsev, A., Haubold, A., Tesic, J., Xie, L., Yan, R. Semantic concept-based query expansion and re-ranking for multimedia retrieval. In Proceedings of the ACM International Conference on Multimedia (Augsburg, Germany, September 24-29, 2007). MM '07. ACM Press, New York, NY, 991--1000. Google ScholarDigital Library
- Neo, S.-Y., Zhao, J., Kan, M.-Y., Chua, T.-S. Video retrieval using high level features: Exploiting query matching and confidence-based weighting. In Proceedings of the ACM International Conference on Image and Video Retrieval (Tempe, AZ, July 13-15, 2006). CIVR '06. Spring Verlag, Berlin, Heidelberg, 143--152. Google ScholarDigital Library
- Chang, S.-F., Hsu, W., Kennedy, L., Xie, L., Yanagawa, A., Zavesky, E., Zhang, D. Columbia University, TRECVID-2005 Video Search and High-Level Feature Extraction. In Proceedings of the NIST TRECVID 2005 Workshop (Gaithersburg, MD, November 14-15, 2005). TRECVID '05.Google Scholar
- Chua, T.-S., Neo, S.-Y., Zheng, Y., Goh, H.-K., Xiao, Y., Zhao, M., Tang, S., Gao, S., Zhu, X., Chaisorn, L., Sun, Q. TRECVID-2006 by NUS-I2R. In Proceedings of the NIST TRECVID 2006 Workshop (Gaithersburg, MD, November 13-14, 2006). TRECVID '06.Google Scholar
- Snoek, C. G. M., van Gemert, J. C., Geusebroek, J. M., Huurnink, B., Koelma, D. C., Nguyen, G. P., Rooij, O. D., Seinstra, F. J., Smeulders, A. W. M., Veenman, C. J., Worring, M. The MediaMill TRECVID 2005 Semantic Video Search Engine. In Proceedings of the NIST TRECVID 2005 Workshop (Gaithersburg, MD, November 14-15, 2005). TRECVID '05.Google Scholar
- Haubold, A., Natsev, A., Naphade, M. Semantic multimedia retrieval using lexical query expansion and model-based reranking. In Proceedings of the International Conference on Multimedia and Expo (Toronto, Canada, July 9-12, 2006). ICME '06. IEEE Press, New York, NY, 1761--1764.Google ScholarCross Ref
- Varelas, G., Voutsakis, E., Raftopoulou, P., Petrakis, E.G.M., Milios, E.E. Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web. In Proceedings of the ACM Workshop on Web Information and Data Management (Bremen, Germany, November 5, 2005). WIDM '05. ACM Press, New York, NY, 10--16. Google ScholarDigital Library
Index Terms
- Web-based information content and its application to concept-based video retrieval
Recommendations
An information Content-Based Approach for Measuring Concept Semantic Similarity in WordNet
Computing information content (IC) of a concept is a core issue for semantic similarity measures of IC-based. So far, little works focused on calculating the IC of multiple inheritance nodes. So in this paper, a new IC computing model is proposed to ...
Information retrieval using word senses: root sense tagging approach
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalInformation retrieval using word senses is emerging as a good research challenge on semantic information retrieval. In this paper, we propose a new method using word senses in information retrieval: root sense tagging method. This method assigns coarse-...
Toward an improved concept-based information retrieval system
SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrievalThis paper presents a novel information retrieval system that includes 1) the addition of concepts to facilitate the identification of the correct word sense, 2) a natural language query interface, 3) the inclusion of weights and penalties for proper ...
Comments