skip to main content
10.1145/1180639.1180697acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

Segmentation, categorization, and identification of commercial clips from TV streams using multimodal analysis

Authors Info & Claims
Published:23 October 2006Publication History

ABSTRACT

TV advertising is ubiquitous, perseverant, and economically vital. Millions of people's living and working habits are affected by TV commercials. In this paper, we present a multimodal ("visual + audio + text") commercial video digest scheme to segment individual commercials and carry out semantic content analysis within a detected commercial segment from TV streams.Two challenging issues are addressed. Firstly, we propose a multimodal approach to robustly detect the boundaries of individual commercials. Secondly, we attempt to classify a commercial with respect to advertised products/services. For the first, the boundary detection of individual commercials is reduced to the problem of binary classification of shot boundaries via the mid-level features derived from two concepts: Image Frames Marked with Product Information (FMPI) and Audio Scene Change Indicator (ASCI). Moreover, the accurate individual boundary enables us to perform commercial identification by clip matching via a spatial-temporal signature. For the second, commercial classification is formulated as the task of text categorization by expanding sparse texts from ASR/OCR with external knowledge. Our boundary detection has achieved a good result of F1 = 93.7% on the dataset comprising 499 individual commercials from TRECVID'05 video corpus. Commercial classification has obtained a promising accuracy of 80.9% on 141 distinct ones. Based on these achievements, various applications such as an intelligent digital TV set-top box can be accomplished to enhance the TV viewer's capabilities in monitoring and managing commercials from TV streams.

References

  1. J.V. Vilanilam and A.K. Varghese, Advertising basics! A resource guide for beginners. Response Books, New Delhi, 2004.Google ScholarGoogle Scholar
  2. M. Mizutani, etc., "Commercial detection in heterogeneous video streams using fused multi-modal and temporal features," Proc. ICASSP'05.Google ScholarGoogle Scholar
  3. L. Agnihotri, etc., "Evolvable visual commercial detector," Proc. CVPR' 03.Google ScholarGoogle Scholar
  4. R. Lienhart, C. Kuhmunch, and W. Effelsberg, "On the detection and recognition of television commercials," Proc. ICMCS'97, pp. 509--516. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. H. Sundaram and S.-F. Chang, "Computable scenes and structures in films," IEEE Tran. TMM, 4(4):482--491, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. R. Kender and B.L. Yeo, "Video scene segmentation via continuous video coherence," Proc. CVPR'98, CA, USA, pp.367--373. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Yeung and B.L. Yeo, "Time-constrained clustering for segmentation of video into story units," Proc. ICPR'96, Vienna, Austria, pp.375--380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Hanjalic, etc., "Automated high-level movie segmentation for advanced video-retrieval systems," IEEE Tran. CSVT, 9(4):580--588, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Lienhart, S. Pfeiffer, and W. Effelsberg, "Scene determination based on video and audio features," Proc. ICMCS'99, pp.685--690. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. G. Hauptmann and M. J. Witbrock, "Story segmentation and detection of commercials in broadcast news video," Proc. Conf. ADL' 98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. Chaisorn, etc., "A two-level multi-modal approach for story segmentation of large news video corpus," Proc. TRECVID'03, MD, USA.Google ScholarGoogle Scholar
  12. X.-S. Hua, L. Lu, and H.-J. Zhang, "Robust learning-based TV commercial detection," Proc. ICME'05, Amsterdam, Netherlands, pp.149--152.Google ScholarGoogle Scholar
  13. A. Albiol, etc., "Commercials detection using HMMs," Proc. Int. Workshop Image Analysis for Multimedia Interactive Services, Portugal, 2004.Google ScholarGoogle Scholar
  14. S. Marlow, etc., "Audio and video processing for automatic TV advertisement detection," Proc. Conf. Irish Signals and Systems, Ireland, 2001.Google ScholarGoogle Scholar
  15. J. Wang, etc. "A robust method for TV logo tracking in video streams," ICME'06.Google ScholarGoogle Scholar
  16. K. Matsumoto, etc., "Shot boundary determination and low-level feature extraction experiments for TRECVID 2005," Proc. TRECVID'05, USA.Google ScholarGoogle Scholar
  17. B.S. Manjunath and W.Y. Ma, "Texture features for browsing and retrieval of image data," IEEE Tran. PAMI, 18(8):837--842, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. V. Vapnik, The nature of statistical learning theory. Springer-Verlag,'95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. Zhang and C.-C. Jay Kuo, "Audio content analysis for online audio-visual data segmentation and classification," IEEE Tran. Speech and Audio Processing, 9(4):441--457, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  20. HTK toolkit. {Online} Available: http://htk.eng.cam.ac.uk/.Google ScholarGoogle Scholar
  21. T.-S. Chua, etc., "TRECVID 2005 by NUS PRIS," Proc. TRECVID'05, Gaithersburg, MD, USA.Google ScholarGoogle Scholar
  22. L.-Y. Duan, etc., "A unified framework for semantic shot classification in sports video," IEEE Tran. TMM, 7(6):1066--1083, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M.R. Naphade and T.S. Huang, "A probabilistic framework for semantic video indexing, filtering, and retrieval," IEEE Tran. TMM, 3(1):141--151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Amir, etc., "IBM research TRECVID-2005 video retrieval system," Proc. TRECVID'05, Gaithersburg, MD, USA.Google ScholarGoogle Scholar
  25. C.-S. Xu, etc., "Live sports event detection based on broadcast video and web-casting text," Proc. ACM Int. Conf. Multimedia'06, CA, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. N. Babaguchi, etc., "Event based indexing of broadcasted sports video by intermodal collaboration," IEEE Tran. TMM, 4(1):68--75, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Reuters-21578 Text Categorization Test Collection. {Online} Available: http://www.daviddlewis.com/resources/testcollections/reuters21578/Google ScholarGoogle Scholar
  28. F. Sebastiani, "Machine learning in automated text categorization," ACM Computing Surveys, 54(1):1--47, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. K. Lang, "Newsweeder: learning to filter netnews," Proc. ICML'95.Google ScholarGoogle Scholar
  30. T. Joachims, "Text categorization with support vector machines: learning with many relevant features," Proc. ECML'98, Germany, pp.137--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Yuan, etc., "Fast and robust short video clip search using an index structure," Proc. ACM MIR'04, New York, USA, pp. 61--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. K. Kashino, etc., "A quick search method for audio and video signals based on histogram pruning," IEEE Tran. TMM, 5(3):348--357, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. LIBSVM. {Online} Available: http://www.csie.ntu.edu.tw/~cjlin/libsvm/Google ScholarGoogle Scholar
  34. C. Colombo, etc., "Retrieval of commercials by semantic content: The semiotic perspective," Multimedia Tools and Applications, 13(1):93--118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Yuan, etc., "Tsinghua Univeristy at TRECVID 2005," Proc. TRECVID'05.Google ScholarGoogle Scholar
  36. A. Hampapur, K. Hyun, and R. Bolle, "Comparison of sequence matching techniques for video copy detection," Proc. SPIE'02, vol.4676.Google ScholarGoogle Scholar

Index Terms

  1. Segmentation, categorization, and identification of commercial clips from TV streams using multimodal analysis

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          MM '06: Proceedings of the 14th ACM international conference on Multimedia
          October 2006
          1072 pages
          ISBN:1595934472
          DOI:10.1145/1180639

          Copyright © 2006 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 23 October 2006

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate995of4,171submissions,24%

          Upcoming Conference

          MM '24
          MM '24: The 32nd ACM International Conference on Multimedia
          October 28 - November 1, 2024
          Melbourne , VIC , Australia

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader