Skip to main content

DUT-WEBV: A Benchmark Dataset for Performance Evaluation of Tag Localization for Web Video

  • Conference paper
Advances in Multimedia Modeling

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7733))

Abstract

Nowadays, numerous social videos have pervaded on the Web. Social web videos are characterized with the accompanying rich contextual information which describe the content of videos and thus greatly facilitate video search and browsing. Generally those context data such as tags are generated for the whole video, without temporal indication on when they actually appear in the video. However, many tags only describe parts of the video content. Therefore, tag localization, the process of assigning tags to the underlying relevant video segments or frames is gaining increasing research interests and a benchmark dataset for the fair evaluation of tag localization algorithms is highly desirable. In this paper, we describe and release a dataset called DUT-WEBV, which contains 1550 videos collected from YouTube portal by issuing 31 concepts as queries. These concepts cover a wide range of semantic aspects including scenes like “mountain”, events like “flood”, objects like “cows”, sites like “gas station”, and activities like “handshaking”, offering great challenges to the tag (i.e., concept) localization task. For each video of a tag, we carefully annotate the time durations when the tag appears in the video. Besides the video itself, the contextual information, such as thumbnail images, titles, and categories, is also provided. Together with this benchmark dataset, we present a baseline for tag localization using multiple instance learning approach. Finally, we discuss some open research issues for tag localization in web videos.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wang, M., Ni, B., Hua, X.-S., Chua, T.-S.: Assistive Tagging: A Survey of Multimedia Tagging with Human-Computer Joint Exploration. ACM Computing Surveys 44(4) (2012)

    Article  Google Scholar 

  2. Ulges, A., Schulze, C., Breuel, T.: Identifying Relevant Frames in Weakly Labeled Videos for Training Concept Detectors. In: ACM CIVR (2008)

    Google Scholar 

  3. Ikizler-Cinbis, N., Cinbis, R.G., Sclaroff, S.: Learning Actions From the Web. In: International Conference on Computer Vision (2009)

    Google Scholar 

  4. Li, G., Wang, M., Zheng, Y.-T., Li, H., Zha, Z.-J., Chua, T.-S.: ShotTagger: tag location for internet videos. In: ICMR (2011)

    Google Scholar 

  5. Wang, M., Hong, R., Li, G., Yan, S., Chua, T.-S.: Event Driven Web Video Summarization by Tag Localization and Key-Shot Identification. IEEE Trans. on Multimedia 14(4), 975–985 (2012)

    Article  Google Scholar 

  6. Hong, R., Tang, J., Tan, H.-K., Ngo, C.-W., Yan, S., Chua, T.-S.: Beyond search: Event-driven summarization for web videos. TOMCCAP 7(4), 35 (2011)

    Article  Google Scholar 

  7. Ballan, L., Bertini, M., Del Bimbo, A., et al.: Tag suggestion and localization in user-generated videos based on social knowledge. In: Proc. of the 2nd ACM SIGMM International Workshop on Social Media (2010)

    Google Scholar 

  8. Ballan, L., Bertini, M., Del Bimbo, A., Serra, G.: Enriching and localizing semantic tags in internet videos. ACM Multimedia (2011)

    Google Scholar 

  9. Chu, W.-T., Li, C.-J., Chou, Y.-K.: Tag suggestion and localization for web videos by bipartite graph matching. In: Proc. of the 3rd ACM SIGMM International Workshop on Social Media, WSM 2011 (2011)

    Google Scholar 

  10. Ulges, A., Schulze, C., Breuel, T.: Multiple Instance Learning from Weak-ly Labeled Videos. In: SAMT Workshop on Cross-Media Information Analysis and Retrieval (2008)

    Google Scholar 

  11. Naphade, M., Smith, J.R., Tesic, J., Chang, S.-F., Hsu, W., Kennedy, L., Hauptmann, A., Curtis, J.: Large-Scale Concept Ontology for Multimedia. IEEE MultiMedia 13, 86–91 (2006)

    Article  Google Scholar 

  12. Jiang, Y.-G., Ye, G., Chang, S.-F., Ellis, D.P.W., Loui, A.C.: Consumer video understanding: a benchmark database and an evaluation of human and machine performance. In: ICMR (2011)

    Google Scholar 

  13. Cao, J., Zhang, Y.D., Song, Y.C., Chen, Z.N., Zhang, X., Li, J.T.: MCG-WEBV: A Benchmark Dataset for Web Video Analysis. Technical Report, ICT-MCG-09-001, Institute of Computing Technology (May 2009)

    Google Scholar 

  14. Ulges, A., Schulze, C., Keysers, D., Breuel, T.M.: A System That Learns to Tag Videos by Watching Youtube. In: Gasteratos, A., Vincze, M., Tsotsos, J.K. (eds.) ICVS 2008. LNCS, vol. 5008, pp. 415–424. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  15. http://www-nlpir.nist.gov/projects/tv2012/tv2012.html

  16. Tang, J., Li, H., Qi, G.-J., Chua, T.-S.: Image Annotation by Graph-Based Inference With Integrated Multiple/Single Instance Representations. IEEE Transactions on Multimedia 12(2), 131–141 (2010)

    Article  Google Scholar 

  17. Zhang, M.-L., Zhou, Z.-H.: Improve Multi-Instance Neural Networks through Feature Selection. Neural Process Letters 19(1), 1–10 (2004)

    Article  Google Scholar 

  18. Tang, S., Zheng, Y.-T., Wang, Y., Chua, T.-S.: Sparse Ensemble Learning for Concept Detection. IEEE Transactions on Multimedia 14(1), 43–54 (2012)

    Article  Google Scholar 

  19. Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: CVPR (2006)

    Google Scholar 

  20. Shen, J., Tao, D., Li, X.: Modality Mixture Projections for Semantic Video Event Detection. IEEE Trans. Circuits Syst. Video Techn. 18(11), 1587–1596 (2008)

    Article  Google Scholar 

  21. Wang, M., Yang, K., Hua, X.-S., Zhang, H.-J.: Towards a Relevant and Diverse Search of Social Images. IEEE Transactions on Multimedia 12(8), 829–842 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, H., Yi, L., Guan, Y., Zhang, H. (2013). DUT-WEBV: A Benchmark Dataset for Performance Evaluation of Tag Localization for Web Video. In: Li, S., et al. Advances in Multimedia Modeling. Lecture Notes in Computer Science, vol 7733. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35728-2_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35728-2_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35727-5

  • Online ISBN: 978-3-642-35728-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics