DUT-WEBV: A Benchmark Dataset for Performance Evaluation of Tag Localization for Web Video

Li, Haojie; Yi, Lei; Guan, Yue; Zhang, Hao

doi:10.1007/978-3-642-35728-2_29

Haojie Li⁷,
Lei Yi⁷,
Yue Guan⁷ &
…
Hao Zhang⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7733))

2021 Accesses
6 Citations

Abstract

Nowadays, numerous social videos have pervaded on the Web. Social web videos are characterized with the accompanying rich contextual information which describe the content of videos and thus greatly facilitate video search and browsing. Generally those context data such as tags are generated for the whole video, without temporal indication on when they actually appear in the video. However, many tags only describe parts of the video content. Therefore, tag localization, the process of assigning tags to the underlying relevant video segments or frames is gaining increasing research interests and a benchmark dataset for the fair evaluation of tag localization algorithms is highly desirable. In this paper, we describe and release a dataset called DUT-WEBV, which contains 1550 videos collected from YouTube portal by issuing 31 concepts as queries. These concepts cover a wide range of semantic aspects including scenes like “mountain”, events like “flood”, objects like “cows”, sites like “gas station”, and activities like “handshaking”, offering great challenges to the tag (i.e., concept) localization task. For each video of a tag, we carefully annotate the time durations when the tag appears in the video. Besides the video itself, the contextual information, such as thumbnail images, titles, and categories, is also provided. Together with this benchmark dataset, we present a baseline for tag localization using multiple instance learning approach. Finally, we discuss some open research issues for tag localization in web videos.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Wang, M., Ni, B., Hua, X.-S., Chua, T.-S.: Assistive Tagging: A Survey of Multimedia Tagging with Human-Computer Joint Exploration. ACM Computing Surveys 44(4) (2012)
Article Google Scholar
Ulges, A., Schulze, C., Breuel, T.: Identifying Relevant Frames in Weakly Labeled Videos for Training Concept Detectors. In: ACM CIVR (2008)
Google Scholar
Ikizler-Cinbis, N., Cinbis, R.G., Sclaroff, S.: Learning Actions From the Web. In: International Conference on Computer Vision (2009)
Google Scholar
Li, G., Wang, M., Zheng, Y.-T., Li, H., Zha, Z.-J., Chua, T.-S.: ShotTagger: tag location for internet videos. In: ICMR (2011)
Google Scholar
Wang, M., Hong, R., Li, G., Yan, S., Chua, T.-S.: Event Driven Web Video Summarization by Tag Localization and Key-Shot Identification. IEEE Trans. on Multimedia 14(4), 975–985 (2012)
Article Google Scholar
Hong, R., Tang, J., Tan, H.-K., Ngo, C.-W., Yan, S., Chua, T.-S.: Beyond search: Event-driven summarization for web videos. TOMCCAP 7(4), 35 (2011)
Article Google Scholar
Ballan, L., Bertini, M., Del Bimbo, A., et al.: Tag suggestion and localization in user-generated videos based on social knowledge. In: Proc. of the 2nd ACM SIGMM International Workshop on Social Media (2010)
Google Scholar
Ballan, L., Bertini, M., Del Bimbo, A., Serra, G.: Enriching and localizing semantic tags in internet videos. ACM Multimedia (2011)
Google Scholar
Chu, W.-T., Li, C.-J., Chou, Y.-K.: Tag suggestion and localization for web videos by bipartite graph matching. In: Proc. of the 3rd ACM SIGMM International Workshop on Social Media, WSM 2011 (2011)
Google Scholar
Ulges, A., Schulze, C., Breuel, T.: Multiple Instance Learning from Weak-ly Labeled Videos. In: SAMT Workshop on Cross-Media Information Analysis and Retrieval (2008)
Google Scholar
Naphade, M., Smith, J.R., Tesic, J., Chang, S.-F., Hsu, W., Kennedy, L., Hauptmann, A., Curtis, J.: Large-Scale Concept Ontology for Multimedia. IEEE MultiMedia 13, 86–91 (2006)
Article Google Scholar
Jiang, Y.-G., Ye, G., Chang, S.-F., Ellis, D.P.W., Loui, A.C.: Consumer video understanding: a benchmark database and an evaluation of human and machine performance. In: ICMR (2011)
Google Scholar
Cao, J., Zhang, Y.D., Song, Y.C., Chen, Z.N., Zhang, X., Li, J.T.: MCG-WEBV: A Benchmark Dataset for Web Video Analysis. Technical Report, ICT-MCG-09-001, Institute of Computing Technology (May 2009)
Google Scholar
Ulges, A., Schulze, C., Keysers, D., Breuel, T.M.: A System That Learns to Tag Videos by Watching Youtube. In: Gasteratos, A., Vincze, M., Tsotsos, J.K. (eds.) ICVS 2008. LNCS, vol. 5008, pp. 415–424. Springer, Heidelberg (2008)
Chapter Google Scholar
http://www-nlpir.nist.gov/projects/tv2012/tv2012.html
Tang, J., Li, H., Qi, G.-J., Chua, T.-S.: Image Annotation by Graph-Based Inference With Integrated Multiple/Single Instance Representations. IEEE Transactions on Multimedia 12(2), 131–141 (2010)
Article Google Scholar
Zhang, M.-L., Zhou, Z.-H.: Improve Multi-Instance Neural Networks through Feature Selection. Neural Process Letters 19(1), 1–10 (2004)
Article Google Scholar
Tang, S., Zheng, Y.-T., Wang, Y., Chua, T.-S.: Sparse Ensemble Learning for Concept Detection. IEEE Transactions on Multimedia 14(1), 43–54 (2012)
Article Google Scholar
Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: CVPR (2006)
Google Scholar
Shen, J., Tao, D., Li, X.: Modality Mixture Projections for Semantic Video Event Detection. IEEE Trans. Circuits Syst. Video Techn. 18(11), 1587–1596 (2008)
Article Google Scholar
Wang, M., Yang, K., Hua, X.-S., Zhang, H.-J.: Towards a Relevant and Diverse Search of Social Images. IEEE Transactions on Multimedia 12(8), 829–842 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Software, Dalian University of Technology, China
Haojie Li, Lei Yi, Yue Guan & Hao Zhang

Authors

Haojie Li
View author publications
You can also search for this author in PubMed Google Scholar
Lei Yi
View author publications
You can also search for this author in PubMed Google Scholar
Yue Guan
View author publications
You can also search for this author in PubMed Google Scholar
Hao Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Microsoft Research Asia, 5 Danling Street, 100080, Beijing, China
Shipeng Li & Tao Mei &
School of Electrical Engineering and Computer Science, University of Ottawa, 800 King Edward, K1N 6N5, Ottawa, ON, Canada
Abdulmotaleb El Saddik
School of Computer and Information, Hefei University of Technology, Road Tunxi 193#, 230009, Hefei, Anhui, China
Meng Wang & Richang Hong &
Department of Information Engineering and Computer Science, University of Trento, ommarive 14, 38100, Trento, Italy
Nicu Sebe
Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, 117583, Singapore, Singapore
Shuicheng Yan
School of Computing, CLARITY: Centre for Sensor Web Technologies, Dublin City University, Glasnevin, 9, Dublin, Ireland
Cathal Gurrin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, H., Yi, L., Guan, Y., Zhang, H. (2013). DUT-WEBV: A Benchmark Dataset for Performance Evaluation of Tag Localization for Web Video. In: Li, S., et al. Advances in Multimedia Modeling. Lecture Notes in Computer Science, vol 7733. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35728-2_29

Download citation

DOI: https://doi.org/10.1007/978-3-642-35728-2_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35727-5
Online ISBN: 978-3-642-35728-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics