research-article

Web-based information content and its application to concept-based video retrieval

Authors:
Alexander Haubold

Columbia University, New York, NY, USA

Columbia University, New York, NY, USA
View Profile

,
Apostol Natsev

IBM Thomas J. Watson, Hawthorne, NY, USA

IBM Thomas J. Watson, Hawthorne, NY, USA
View Profile

CIVR '08: Proceedings of the 2008 international conference on Content-based image and video retrievalJuly 2008Pages 437–446https://doi.org/10.1145/1386352.1386408

Published:07 July 2008Publication History

CIVR '08: Proceedings of the 2008 international conference on Content-based image and video retrieval

Pages 437–446

ABSTRACT

Semantic similarity between words or phrases is frequently used to find matching correlations between search queries and documents when straightforward matching of terms fails. This is particularly important for searching in visual databases, where pictures or video clips have been automatically tagged with a small set of semantic concepts based on analysis and classification of the visual content. Here, the textual description of documents is very limited, and semantic similarity based on WordNet's cognitive synonym structure, along with information content derived from term frequencies, can help to bridge the gap between an arbitrary textual query and a limited vocabulary of visual concepts. This approach, termed concept-based retrieval, has received significant attention over the last few years, and its success is highly dependent on the quality of the similarity measure used to map textual query terms to visual concepts.

In this paper, we consider some issues of semantic similarity measures based on Information Content (IC), and propose a way to improve them. In particular, we note that most IC-based similarity measures are derived from a small and relatively outdated corpus (the Brown corpus), which does not adequately capture the usage pattern of many contemporary terms: for example, out of more than 150,000 WordNet terms, only about 36,000 are represented. This shortcoming reflects very negatively on the coverage of typical search query terms. We therefore suggest using alternative IC corpora that are larger and better aligned with the usage of modern vocabulary. We experimentally derive two such corpora using the WWW Google search engine, and show that they provide better coverage of vocabulary, while showing comparable frequencies for Brown corpus terms. Finally, we evaluate the two proposed IC corpora in the context of a concept-based video retrieval application using the TRECVID 2005, 2006, and 2007 datasets, and we show that they increase average precision results by up to 200%.

References

Fellbaum, C. WordNet: An Electronic Lexical Database. 1998. MIT Press, Cambridge, MA.Google Scholar
Zhai, Y., Liu, J., Shah, M. Automatic Query Expansion for News Video Retrieval. In Proceedings of the International Conference on Multimedia and Expo (Toronto, Canada, July 9-12, 2006). ICME '06. IEEE Press, New York, NY, 965--968.Google Scholar
Snoek, C.G.M., Huurnink, B., Hollink, L., de Rijke, M., Schreiber, G., Worring, M. Adding Semantics to Detectors for Video Retrieval, IEEE Transactions on Multimedia, Vol. 9, Issue 5 (August 2007). IEEE Press, New York, NY, 975--986. Google ScholarDigital Library
Wu. Z., Palmer, M. Verb semantics and lexical selection. In Proceedings of Annual Meeting of the Association for Computational Linguistics (Las Cruces, NM, June 27-30, 1994). Morgan Kaufmann, San Francisco, CA, 133--138. Google ScholarDigital Library
Resnik, P. Using Information Content to Evaluate Semantic Similarity in a Taxonomy. In Proceedings of the International Joint Conference on Artificial Intelligence (Montréal, Canada, August 20-25, 1995). IJCAI '95. Morgan Kaufmann, San Francsico, CA, 448--453. Google ScholarDigital Library
Jiang, J.J., Conrath, D.W. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. In Proceedings of the International Conference Research on Computational Linguistics (Taipei, Taiwan, August 22-24, 1997). ROCLING X. 1997.Google Scholar
Lin, D. An information-theoretic definition of similarity. In Proceedings of the International Conference on Machine Learning (Madison, WI, 1998). ICML '98. Morgan Kaufmann, San Francisco, CA, 296--304. Google ScholarDigital Library
Lesk, M.E. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In Proceedings of the Special Interest Group Design of Communication Conference (Toronto, Canada, June 8-11). SIGDOC '86. ACM Press, New York, NY, 24--26. Google ScholarDigital Library
Leacock, C., Chodorow, M., Miller, G.A. Using corpus statistics and WordNet relations for sense identification. In Computational Linguistics, Vol. 24, Number 1 (March 1998). MIT Press, Cambridge, MA, 147--165. Google ScholarDigital Library
Over, P. Ianeva, T., Kraaij, W., Smeaton, A.F. TRECVID 2005 An Overview. In Proceedings of the NIST TRECVID 2005 Workshop (Gaithersburg, MD, November 14-15, 2005). TRECVID '05.Google Scholar
Over P., Ianeva, T., Kraaij, W., Smeaton, A.F. TRECVID 2006 Overview. In Proceedings of the NIST TRECVID 2006 Workshop (Gaithersburg, MD, November 13-14, 2006). TRECVID '06.Google Scholar
Over, P. Awad, G. Kraaij, W., Smeaton, A.F. TRECVID 2007 - An Introduction. In Proceedings of the NIST TRECVID 2007 Workshop (Gaithersburg, MD, November 5-6, 2007). TRECVID '07.Google Scholar
Pedersen, T., Patwardhan, Michelizzi, J. Wordnet::similarity - measuring the relatedness of concepts. In Proceedings of the Annual Meeting of the North American Chapter of the Association for Computational Linguistics (Boston, MA, May 3-5, 2004). NAACL '04. Association for Computational Linguistics, Morristown, NJ, 38--41. Google ScholarDigital Library
Patwardhan, S., Banerjee, S., Pedersen, T. Using Measures of Semantic Relatedness for Word Sense Disambiguation. In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics (Mexico City, Mexico, February 16-22, 2003). CICLing '03. Springer Verlag, Berlin, Heidelberg, 241--257. Google ScholarDigital Library
Seco, N., Veale, T., Hayes, J. An Intrinsic Information Content Metric for Semantic Similarity in WordNet. In Proceedings of the European Conference on Artificial Intelligence (Valencia, Spain, August 22-27, 2004). ECAI '04. IOS Press, Amsterdam, The Netherlands, 1089--1090.Google Scholar
Budanitsky, A., Hirst, G. Semantic distance in WordNet: An experimental, application--oriented evaluation of five measures. In Proceedings of the North American Chapter of the Association for Computational Linguistics Workshop (Pittsburgh, PA, June 2-7, 2001). NAACL '01. Association for Computational Linguistics, Morristown, NJ, 29--34.Google Scholar
Pucher, M. Performance Evaluation of WordNet-based Semantic Relatedness Measures for Word Prediction in Conversational Speech. In Proceedings of the International Workshop on Computational Semantics (Tilburg, Netherlands, January 12-14, 2005). IWCS 6.Google Scholar
Pedersen, T., Pakhomov, S. Developing Measures of Semantic Relatedness for the Biomedical Domain. Digital Technology Initiatives Forum (Minneapolis, MN, Feb 28, 2005). Digital Technology Center, University of Minnesota.Google Scholar
Naphade, M., Smith, J.R., Souvannavong, F. On the Detection of Semantic Concepts at TRECVID. In Proceedings of the ACM Internation Multimedia Conference (New York, NY, October 10-16, 2004). ACM Press, New York, NY, 660--667. Google ScholarDigital Library
Natsev, A., Haubold, A., Tesic, J., Xie, L., Yan, R. Semantic concept-based query expansion and re-ranking for multimedia retrieval. In Proceedings of the ACM International Conference on Multimedia (Augsburg, Germany, September 24-29, 2007). MM '07. ACM Press, New York, NY, 991--1000. Google ScholarDigital Library
Neo, S.-Y., Zhao, J., Kan, M.-Y., Chua, T.-S. Video retrieval using high level features: Exploiting query matching and confidence-based weighting. In Proceedings of the ACM International Conference on Image and Video Retrieval (Tempe, AZ, July 13-15, 2006). CIVR '06. Spring Verlag, Berlin, Heidelberg, 143--152. Google ScholarDigital Library
Chang, S.-F., Hsu, W., Kennedy, L., Xie, L., Yanagawa, A., Zavesky, E., Zhang, D. Columbia University, TRECVID-2005 Video Search and High-Level Feature Extraction. In Proceedings of the NIST TRECVID 2005 Workshop (Gaithersburg, MD, November 14-15, 2005). TRECVID '05.Google Scholar
Chua, T.-S., Neo, S.-Y., Zheng, Y., Goh, H.-K., Xiao, Y., Zhao, M., Tang, S., Gao, S., Zhu, X., Chaisorn, L., Sun, Q. TRECVID-2006 by NUS-I2R. In Proceedings of the NIST TRECVID 2006 Workshop (Gaithersburg, MD, November 13-14, 2006). TRECVID '06.Google Scholar
Snoek, C. G. M., van Gemert, J. C., Geusebroek, J. M., Huurnink, B., Koelma, D. C., Nguyen, G. P., Rooij, O. D., Seinstra, F. J., Smeulders, A. W. M., Veenman, C. J., Worring, M. The MediaMill TRECVID 2005 Semantic Video Search Engine. In Proceedings of the NIST TRECVID 2005 Workshop (Gaithersburg, MD, November 14-15, 2005). TRECVID '05.Google Scholar
Haubold, A., Natsev, A., Naphade, M. Semantic multimedia retrieval using lexical query expansion and model-based reranking. In Proceedings of the International Conference on Multimedia and Expo (Toronto, Canada, July 9-12, 2006). ICME '06. IEEE Press, New York, NY, 1761--1764.Google ScholarCross Ref
Varelas, G., Voutsakis, E., Raftopoulou, P., Petrakis, E.G.M., Milios, E.E. Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web. In Proceedings of the ACM Workshop on Web Information and Data Management (Bremen, Germany, November 5, 2005). WIDM '05. ACM Press, New York, NY, 10--16. Google ScholarDigital Library

Index Terms

Web-based information content and its application to concept-based video retrieval

Recommendations

An information Content-Based Approach for Measuring Concept Semantic Similarity in WordNet

Computing information content (IC) of a concept is a core issue for semantic similarity measures of IC-based. So far, little works focused on calculating the IC of multiple inheritance nodes. So in this paper, a new IC computing model is proposed to ...
Read More
Information retrieval using word senses: root sense tagging approach
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval

Information retrieval using word senses is emerging as a good research challenge on semantic information retrieval. In this paper, we propose a new method using word senses in information retrieval: root sense tagging method. This method assigns coarse-...
Read More
Toward an improved concept-based information retrieval system
SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval

This paper presents a novel information retrieval system that includes 1) the addition of concepts to facilitate the identification of the correct word sense, 2) a natural language query interface, 3) the inclusion of weights and penalties for proper ...
Read More

Reviews

Reviewer: Jonathan P. E. Hodgson

A concept-based index for a collection of videos can be constructed using a set of statistical detectors for a fixed set of semantic concepts. The resulting index uses only the limited number of concepts based on the original set of semantic concepts; however, the index can be enhanced by using a corpus, such as the Brown corpus, and WordNet to map a larger set of concepts onto the smaller set used to create the original index. This mapping is derived in part from measuring the information content of each concept that in turn is derived from the original corpus. The size and coverage of the corpus is therefore a critical ingredient of the process. Given the age of the Brown corpus, there are a number of deficiencies related to its use because many of the words in WordNet do not appear in the corpus. This paper proposes two methods for constructing a new assignment of information content to concepts from WordNet. In the first, a corpus is constructed by taking for each concept in WordNet the first ten documents retrieved by Google. In the second method, the number of pages found by Google for each word serves as the basis for computing the information content of the word; thus, the Google knowledge base is in some sense the corpus. These approaches are used to construct enhanced context indexes for the Text Retrieval Conference Video (TRECVID) retrieval datasets that show substantially better performance than the systems based on the Brown corpus. The effectiveness of the "pages retrieved" count strategy is a particularly striking example of the use of the Web as a resource for large-scale data. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIVR '08: Proceedings of the 2008 international conference on Content-based image and video retrieval
July 2008
674 pages
ISBN:9781605580708
DOI:10.1145/1386352
General Chairs:
Jiebo Luo
Kodak Research Laboratories
,
Ling Guan
Ryerson University
,
Program Chairs:
Alan Hanjalic
Delft University of Technology
,
Mohan Kankanhalli
National University of Singapore
,
Ivan Lee
University of South Australia
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 July 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
LSCOM
TRECVid
WordNet
brown corpus
information content
semantic similarity
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 486
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Web-based information content and its application to concept-based video retrieval

CIVR '08: Proceedings of the 2008 international conference on Content-based image and video retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

An information Content-Based Approach for Measuring Concept Semantic Similarity in WordNet

Information retrieval using word senses: root sense tagging approach

Toward an improved concept-based information retrieval system

Reviews

Access critical reviews of Computing literature here