ABSTRACT
The analysis of the leading social video sharing platform YouTube reveals a high amount of redundancy, in the form of videos with overlapping or duplicated content. In this paper, we show that this redundancy can provide useful information about connections between videos. We reveal these links using robust content-based video analysis techniques and exploit them for generating new tag assignments. To this end, we propose different tag propagation methods for automatically obtaining richer video annotations. Our techniques provide the user with additional information about videos, and lead to enhanced feature representations for applications such as automatic data organization and search. Experiments on video clustering and classification as well as a user evaluation demonstrate the viability of our approach.
- J. Allan, R. Papka, and V. Lavrenko. On-line new event detection and tracking. In SIGIR '98, pages 37--45. ACM Press, 1998. Google ScholarDigital Library
- E.L. Allwein, R.E. Schapire, and Y. Singer. Reducing multiclass to binary: a unifying approach for margin classifiers. Journal of Machine Learning Research, 1:113--141, 2001. Google ScholarDigital Library
- M. Cha, H. Kwak, P. Rodriguez, Y.-Y. Ahn, and S. Moon. I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system. In IMC '07, pages 1--14, NY, USA, 2007. ACM. Google ScholarDigital Library
- M.S. Charikar. Similarity estimation techniques from rounding algorithms. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pages 380--388, NY, USA, 2002. ACM. Google ScholarDigital Library
- X. Cheng, C. Dale, and J. Liu. Understanding the characteristics of internet short video sharing: Youtube as a case study, Technical Report arXiv:0707.3670v1 {cs.NI}, Cornell University, arXiv e-prints, July 2007.Google Scholar
- N. Craswell and M. Szummer. Random walks on the click graph. In SIGIR'07, pages 239--246, 2007. Google ScholarDigital Library
- S. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representations for text categorization. In CIKM '98, pages 148--155, Bethesda, Maryland, United States, 1998. ACM Press. Google ScholarDigital Library
- N. Shivakumar and H. Garcia-Molina. Scam: A copy detection mechanism for digital documents. In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries. June 1995.Google Scholar
- P. Gill, M. Arlitt, Z. Li, and A. Mahanti. Youtube traffic characterization: a view from the edge. In IMC '07: Proceedings of ACM SIGCOMM, pages 15--28, New York, USA, 2007. Google ScholarDigital Library
- J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2001. Google ScholarDigital Library
- J.S. Hare, P.H. Lewis, P.G.B. Enser, and C.J. Sandom. Mind the gap: another look at the problem of the semantic gap in image retrieval. Multimedia Content Analysis, Management, and Retrieval 2006, 6073(1), 2006.Google Scholar
- A. Hotho, R. Jäschke, C. Schmitz, and G. Stumme. Information Retrieval in Folksonomies: Search and Ranking. In The Semantic Web: Research and Applications, volume 4011 of LNAI, pages 411--426, Heidelberg, 2006. Springer. Google ScholarDigital Library
- S. Huffman, A. Lehman, A. Stolboushkin, H. Wong-Toi, F. Yang, and H. Roehrig. Multiple-signal duplicate detection for search evaluation. In SIGIR '07, pages 223--230, New York, USA, 2007. ACM. Google ScholarDigital Library
- Y. Jing and S. Baluja. Pagerank for product image search. In WWW '08, pages 307--316, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- A. Joly, O. Buisson, and C. Frelicot. Content-based copy retrieval using distortion-based probabilistic similarity search. Multimedia, IEEE Transactions on, 9(2):293--306, 2007. Google ScholarDigital Library
- Y. Ke, R. Sukthankar, and L. Huston. An efficient parts-based near-duplicate and sub-image retrieval system. In ACM Multimedia, MM'04, pages 869--876, New York, USA, 2004. ACM Press. Google ScholarDigital Library
- J.M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999. Google ScholarDigital Library
- R. Likert. A technique for the measurement of attitudes. Archives of Psychology, 22(140):1--55, 1932.Google Scholar
- L. Liu, W. Lai, X.-S. Hua, and S.-Q. Yang. Video histogram: A novel video signature for efficient web video duplicate detection. Advances in Multimedia Modeling, pages 94--103, 2006. Google ScholarDigital Library
- G.S. Manku, A. Jain, and A.D. Sarma. Detecting near-duplicates for web crawling. In ACM WWW'07, pages 141--150, NY, USA, 2007. ACM. Google ScholarDigital Library
- L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.Google Scholar
- J. San Pedro. Fobs: an open source object-oriented library for accessing multimedia content. In ACM Multimedia, MM '08, pages 1097--1100, 2008. Google ScholarDigital Library
- J. San Pedro and S. Dominguez. Network-aware identification of video clip fragments. In CIVR '07, pages 317--324, New York, USA, 2007. ACM Press. Google ScholarDigital Library
- N. Stokes and J. Carthy. Combining semantic and syntactic document classifiers to improve first story detection. In SIGIR '01, pages 424--425, New York, USA, 2001. ACM. Google ScholarDigital Library
- B. Szekely and E. Torres. Ranking bookmarks and bistros: Intelligent community and folksonomy development. In http://torrez.us/archives/2005/07/13/tagrank.pdf (unpublished), 2005.Google Scholar
- S. van Dongen. A cluster algorithm for graphs. National Research Institute for Mathematics and Computer Science in the Netherlands, Amsterdam, Technical Report INS-R0010, 2000. Google ScholarDigital Library
- X. Wu, A.G. Hauptmann, and C.-W. Ngo. Practical elimination of near-duplicates from web video search. In ACM Multimedia, MM'07, pages 218--227, 2007. Google ScholarDigital Library
- H. Yang and J. Callan. Near-duplicate detection by instance-level constrained clustering. In SIGIR '06, pages 421--428, New York, USA, 2006. ACM. Google ScholarDigital Library
- B. Zhang, H. Li, Y. Liu, L. Ji, W. Xi, W. Fan, Z. Chen, and W.-Y. Ma. Improving web search results using affinity graph. In SIGIR '05, pages 504--511, New York, USA, 2005. ACM. Google ScholarDigital Library
- Y. Zhang, J. Callan, and T. Minka. Novelty and redundancy detection in adaptive filtering. In SIGIR '02, pages 81--88, New York, USA, 2002. ACM. Google ScholarDigital Library
Index Terms
- Automatic video tagging using content redundancy
Recommendations
Content redundancy in YouTube and its application to video tagging
The emergence of large-scale social Web communities has enabled users to share online vast amounts of multimedia content. An analysis of YouTube reveals a high amount of redundancy, in the form of videos with overlapping or duplicated content. We use ...
Semi-Automatic Tagging of Photo Albums via Exemplar Selection and Tag Inference
As one of the emerging Web 2.0 activities, tagging becomes a popular approach to manage personal media data, such as photo albums. A dilemma in tagging behavior is the users' manual efforts and the tagging accuracy: exhaustively tagging all photos in an ...
Automatic image tagging through information propagation in a query log based graph structure
MM '11: Proceedings of the 19th ACM international conference on MultimediaAnnotating or tagging multimedia objects is an important task for enhancing multimedia information retrieval processes. In the context of the Web, automatic tagging deals with many issues, such as loosely tagged images and huge collections of images ...
Comments