Article

Segmentation, categorization, and identification of commercial clips from TV streams using multimodal analysis

Authors:
Ling-Yu Duan

Institute for Infocomm Research, Singapore and University of Newcastle, Australia

Institute for Infocomm Research, Singapore and University of Newcastle, Australia
View Profile

,
Jinqiao Wang

Chinese Academy of Sciences, Beijing, China

Chinese Academy of Sciences, Beijing, China
View Profile

,
Yantao Zheng

Institute for Infocomm Research, Singapore

Institute for Infocomm Research, Singapore
View Profile

,
Jesse S. Jin

University of Newcastle, Australia

University of Newcastle, Australia
View Profile

,
Hanqing Lu

Chinese Academy of Sciences, Beijing, China

Chinese Academy of Sciences, Beijing, China
View Profile

,
Changsheng Xu

Institute for Infocomm Research, Singapore

Institute for Infocomm Research, Singapore
View Profile

MM '06: Proceedings of the 14th ACM international conference on MultimediaOctober 2006Pages 201–210https://doi.org/10.1145/1180639.1180697

Published:23 October 2006Publication History

MM '06: Proceedings of the 14th ACM international conference on Multimedia

Pages 201–210

ABSTRACT

TV advertising is ubiquitous, perseverant, and economically vital. Millions of people's living and working habits are affected by TV commercials. In this paper, we present a multimodal ("visual + audio + text") commercial video digest scheme to segment individual commercials and carry out semantic content analysis within a detected commercial segment from TV streams.Two challenging issues are addressed. Firstly, we propose a multimodal approach to robustly detect the boundaries of individual commercials. Secondly, we attempt to classify a commercial with respect to advertised products/services. For the first, the boundary detection of individual commercials is reduced to the problem of binary classification of shot boundaries via the mid-level features derived from two concepts: Image Frames Marked with Product Information (FMPI) and Audio Scene Change Indicator (ASCI). Moreover, the accurate individual boundary enables us to perform commercial identification by clip matching via a spatial-temporal signature. For the second, commercial classification is formulated as the task of text categorization by expanding sparse texts from ASR/OCR with external knowledge. Our boundary detection has achieved a good result of F1 = 93.7% on the dataset comprising 499 individual commercials from TRECVID'05 video corpus. Commercial classification has obtained a promising accuracy of 80.9% on 141 distinct ones. Based on these achievements, various applications such as an intelligent digital TV set-top box can be accomplished to enhance the TV viewer's capabilities in monitoring and managing commercials from TV streams.

References

J.V. Vilanilam and A.K. Varghese, Advertising basics! A resource guide for beginners. Response Books, New Delhi, 2004.Google Scholar
M. Mizutani, etc., "Commercial detection in heterogeneous video streams using fused multi-modal and temporal features," Proc. ICASSP'05.Google Scholar
L. Agnihotri, etc., "Evolvable visual commercial detector," Proc. CVPR' 03.Google Scholar
R. Lienhart, C. Kuhmunch, and W. Effelsberg, "On the detection and recognition of television commercials," Proc. ICMCS'97, pp. 509--516. Google ScholarDigital Library
H. Sundaram and S.-F. Chang, "Computable scenes and structures in films," IEEE Tran. TMM, 4(4):482--491, 2002. Google ScholarDigital Library
J. R. Kender and B.L. Yeo, "Video scene segmentation via continuous video coherence," Proc. CVPR'98, CA, USA, pp.367--373. Google ScholarDigital Library
M. Yeung and B.L. Yeo, "Time-constrained clustering for segmentation of video into story units," Proc. ICPR'96, Vienna, Austria, pp.375--380. Google ScholarDigital Library
A. Hanjalic, etc., "Automated high-level movie segmentation for advanced video-retrieval systems," IEEE Tran. CSVT, 9(4):580--588, 1999. Google ScholarDigital Library
R. Lienhart, S. Pfeiffer, and W. Effelsberg, "Scene determination based on video and audio features," Proc. ICMCS'99, pp.685--690. Google ScholarDigital Library
A. G. Hauptmann and M. J. Witbrock, "Story segmentation and detection of commercials in broadcast news video," Proc. Conf. ADL' 98. Google ScholarDigital Library
L. Chaisorn, etc., "A two-level multi-modal approach for story segmentation of large news video corpus," Proc. TRECVID'03, MD, USA.Google Scholar
X.-S. Hua, L. Lu, and H.-J. Zhang, "Robust learning-based TV commercial detection," Proc. ICME'05, Amsterdam, Netherlands, pp.149--152.Google Scholar
A. Albiol, etc., "Commercials detection using HMMs," Proc. Int. Workshop Image Analysis for Multimedia Interactive Services, Portugal, 2004.Google Scholar
S. Marlow, etc., "Audio and video processing for automatic TV advertisement detection," Proc. Conf. Irish Signals and Systems, Ireland, 2001.Google Scholar
J. Wang, etc. "A robust method for TV logo tracking in video streams," ICME'06.Google Scholar
K. Matsumoto, etc., "Shot boundary determination and low-level feature extraction experiments for TRECVID 2005," Proc. TRECVID'05, USA.Google Scholar
B.S. Manjunath and W.Y. Ma, "Texture features for browsing and retrieval of image data," IEEE Tran. PAMI, 18(8):837--842, 1996. Google ScholarDigital Library
V. Vapnik, The nature of statistical learning theory. Springer-Verlag,'95. Google ScholarDigital Library
T. Zhang and C.-C. Jay Kuo, "Audio content analysis for online audio-visual data segmentation and classification," IEEE Tran. Speech and Audio Processing, 9(4):441--457, 2001.Google ScholarCross Ref
HTK toolkit. {Online} Available: http://htk.eng.cam.ac.uk/.Google Scholar
T.-S. Chua, etc., "TRECVID 2005 by NUS PRIS," Proc. TRECVID'05, Gaithersburg, MD, USA.Google Scholar
L.-Y. Duan, etc., "A unified framework for semantic shot classification in sports video," IEEE Tran. TMM, 7(6):1066--1083, 2005. Google ScholarDigital Library
M.R. Naphade and T.S. Huang, "A probabilistic framework for semantic video indexing, filtering, and retrieval," IEEE Tran. TMM, 3(1):141--151. Google ScholarDigital Library
A. Amir, etc., "IBM research TRECVID-2005 video retrieval system," Proc. TRECVID'05, Gaithersburg, MD, USA.Google Scholar
C.-S. Xu, etc., "Live sports event detection based on broadcast video and web-casting text," Proc. ACM Int. Conf. Multimedia'06, CA, USA. Google ScholarDigital Library
N. Babaguchi, etc., "Event based indexing of broadcasted sports video by intermodal collaboration," IEEE Tran. TMM, 4(1):68--75, 2002. Google ScholarDigital Library
Reuters-21578 Text Categorization Test Collection. {Online} Available: http://www.daviddlewis.com/resources/testcollections/reuters21578/Google Scholar
F. Sebastiani, "Machine learning in automated text categorization," ACM Computing Surveys, 54(1):1--47, 2002. Google ScholarDigital Library
K. Lang, "Newsweeder: learning to filter netnews," Proc. ICML'95.Google Scholar
T. Joachims, "Text categorization with support vector machines: learning with many relevant features," Proc. ECML'98, Germany, pp.137--142. Google ScholarDigital Library
J. Yuan, etc., "Fast and robust short video clip search using an index structure," Proc. ACM MIR'04, New York, USA, pp. 61--68. Google ScholarDigital Library
K. Kashino, etc., "A quick search method for audio and video signals based on histogram pruning," IEEE Tran. TMM, 5(3):348--357, 2003. Google ScholarDigital Library
LIBSVM. {Online} Available: http://www.csie.ntu.edu.tw/~cjlin/libsvm/Google Scholar
C. Colombo, etc., "Retrieval of commercials by semantic content: The semiotic perspective," Multimedia Tools and Applications, 13(1):93--118. Google ScholarDigital Library
J. Yuan, etc., "Tsinghua Univeristy at TRECVID 2005," Proc. TRECVID'05.Google Scholar
A. Hampapur, K. Hyun, and R. Bolle, "Comparison of sequence matching techniques for video copy detection," Proc. SPIE'02, vol.4676.Google Scholar

Index Terms

Segmentation, categorization, and identification of commercial clips from TV streams using multimodal analysis
1. Information systems
  1. Information retrieval
    1. Document representation
    2. Search engine architectures and scalability
      1. Search engine indexing
2. Theory of computation
  1. Semantics and reasoning
    1. Program reasoning
      1. Abstraction

Recommendations

CNN-based Commercial Detection in TV Broadcasting
ICNCC '17: Proceedings of the 2017 VI International Conference on Network, Communication and Computing

TV is an important advertising media. Information of a piece of TV commercial, such as broadcasting time, the duration, the casting and etc., may reflect the business value of the host company of this commercial. An automatic commercial detection system ...
Read More
Digesting Commercial Clips from TV Streams

A commercial system that performs syntactic and semantic analysis during a TV advertising break could facilitate innovative new applications, such as an intelligent set-top box that enhances the ability of viewers to monitor and manage commercials from ...
Read More
Estimation system for human-interest degree while watching TV commercials using EEG
ICONIP'11: Proceedings of the 18th international conference on Neural Information Processing - Volume Part I

In this paper, we propose an estimation system for the human-interest degree while watching TV commercials using the electroencephalogram(EEG). When we use this system, we can estimate the human-interest degree easily, sequentially, and simply. In ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '06: Proceedings of the 14th ACM international conference on Multimedia
October 2006
1072 pages
ISBN:1595934472
DOI:10.1145/1180639
General Chairs:
Klara Nahrstedt
UIUC
,
Matthew Turk
UCSB
,
Program Chairs:
Yong Rui
Microsoft Research
,
Wolfgang Klas
Universität Wien
,
Ketan Mayer-Patel
UNC
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 October 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
TV commercial
mid-level features
multimodal analysis
segmentation
semantics
text categorization
video classification
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 42
  Total Citations
  View Citations
- 1,430
  Total Downloads
- Downloads (Last 12 months)30
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Segmentation, categorization, and identification of commercial clips from TV streams using multimodal analysis

MM '06: Proceedings of the 14th ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

CNN-based Commercial Detection in TV Broadcasting

Digesting Commercial Clips from TV Streams

Estimation system for human-interest degree while watching TV commercials using EEG