Video text detection and localization in intra-frames of H.264/AVC compressed video

Qian, Xueming; Wang, Huan; Hou, Xingsong

doi:10.1007/s11042-012-1168-z

Video text detection and localization in intra-frames of H.264/AVC compressed video

Published: 20 July 2012

Volume 70, pages 1487–1502, (2014)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Xueming Qian¹,
Huan Wang¹ &
Xingsong Hou¹

311 Accesses
7 Citations
Explore all metrics

Abstract

Video texts are closely related to the video content. The video text information can facilitate content based video analysis, indexing and retrieval. Video sequences are usually compressed before storage and transmission. A basic step of text-based applications is text detection and localization. In this paper, an overlaid text detection and localization method is proposed for H.264/AVC compressed videos by using the integer discrete cosine transform (DCT) coefficients of intra-frames. The main contributions of this paper are in the following two aspects: 1) coarse text blocks detection using block sizes and quantization parameters adaptive thresholds; 2) text line localization according to the characteristics of text in intra frames of H.264/AVC compressed domain. Comparisons are made with the pixel domain based text detection method for the H.264/AVC compressed video. Text detection results on five H.264/AVC video sequences under various qualities show the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic video superimposed text detection based on Nonsubsampled Contourlet Transform

Article 25 March 2017

An Effective Approach Towards Video Text Recognition

Robust detection of video text using an efficient hybrid method via key frame extraction and text localization

Article 13 November 2020

References

Chen D, Bourlard H, Thiran J (2001) Text identification in complex background using svm. In Proceedings of the International Conference on Computer Vision and Pattern Recognition, 2, 621-626
Crandall D, Kasturi R (2001) Robust detection of stylized text events in digital video. In Proceedings of the International Conference on Document Analysis and Recognition 865-869
Cui Y, Huang Q (1997) Character extraction of license plates from video. In Proceedings of the Conference on Computer Vision and Pattern Recognition 502-507
Ekin A (2006) Local information based overlaid text detection by classifier fusion. In Proc. ICASSP2006, 2, II753-II756.
Gargi U, Antani S, Kasturi R (1998) Indexing text events in digital video databases. In Proc. Int. Conf. Pattern Recognit., 1, 916-918
Gordon S (2003) Simplified Use of 8x8 Transform. Doc. JVT-I022, San Diego, Sept. 2003
INRIA FTP site. ftp://imedia-ftp.inria.fr//MUSCLE-VCD-2007//DB-MPEG1//Movie23.mpg
Jain A, Yu B (1998) Automatic text location in images and video frames. In Proc. ICPR, 1497-1499
Jiang H, Liu G, Qian X, et al. (2008) A fast and efficient text tracking in compressed video. in Proc ISM
Jung K, Kim K, Jain A (2004) Text information extraction in images and video: a survey. Pattern Recognition 37:977–997
Article Google Scholar
JVT Reference Software version 10.2. ftp://ftp.imtc-files.org/jvt-experts/reference_software/
JVT-G050, 2003. Draft ITU-T recommendation and final draft international standard of joint video specification (ITU-T Rec. H.264/ISO/IEC 14486-10 AVC. in Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VECG
Lee C, Jung K, Kim H (2003) Automatic text detection and removal in video sequences. Pattern Recogn Lett 24:2607–2623
Article Google Scholar
Li H, Doermann D, Kia O (2000) Automatic text detection and tracking in digital video. IEEE Trans Image Process 9(1):147–156
Article Google Scholar
Lim Y, Choi S, Lee S (2000) Text extraction in MPEG compressed video for content-based indexing. In Proc. Int. Conf. on Pattern Recognit., 4, 409-412
Liu Z, Sarkar S (2008) Robust outdoor text detection using text intensity and shape features. in Proc ICPR
Lu S, Barner K (2008) Weighted DCT coefficients based text detection. in Proc. ICASSP 1341-1344
Lyu M, Song J, Cai M (2005) A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Trans Circuits and Systems for Video Technology 15(2):243–255
Article Google Scholar
Malvar H et al (2003) Low-complexity transform and quantization in H.264/AVC. IEEE Trans CSVT 13:598–603
Google Scholar
Mariano V, Kasturi R (2000) Locating uniform-colored text in video frames. in Proc. 15th Int. Conf. Pattern Recognit., 4, 539-542
Ngo C, Chan C (2005) Video text detection and segmentation for optical character recognition. Multimedia Systems 10(3):261–272
Article Google Scholar
Qi W, Gu L, Jiang H, Chen X, Zhang H (2000) Integrating visual, audio and text analysis for news video. in Proc. Int. Conf. Image Process., 3, 520-523
Qian X, Liu G (2006) Text detection, localization and segmentation in compressed videos. in Proc. ICASSP2006., 2, II385-II388
Qian X, Liu G (2007) Global motion estimation from randomly selected motion vector groups and GM/LM based applications. Signal, Image and Video Processing 4:179–189
Article Google Scholar
Qian X, Liu G, Su R (2006) Effective fades and flashlight detection based on accumulating histogram difference. IEEE Trans Circuits and Systems for Video Technology 16(11):1245–1258
Article Google Scholar
Qian X, Liu G, Wang H, Su R (2007) Text detection, localization and tracking in compressed videos. Signal Processing: Image Communication 22(9):752–768
Google Scholar
Rainer L, Axel W (2002) Localizing and segmenting text in images and videos. IEEE Trans Circuits and Systems for Video Technology 12(4):256–267
Article Google Scholar
Sato T, Kanade T (1998) Video OCR: Indexing digital news libraries by recognition of superimposed caption. ICCV Workshop on Image and Video retrieval
Shen B, Sethi I (1996) Direct feature extraction from compressed images. in IS&T SPIE: Storage and Retrieval for Image and Video Databases IV, 2607, 404-417
Shivakumara P, Phan TQ, Tan CL (2009) A robust wavelet transform based technique for video text detection. Int Conf Document Analysis and Recognition, 1285-1289
Snoek C, Worring M (2005) Multimedia event-based video indexing using time intervals. IEEE Trans Multimedia 7(4):638–647
Article Google Scholar
Sun L, Liu G, Qian X, Guo D (2009) A novel text detection and localization method based on corner response. in Proc ICME
Tang X, Gao B, Liu J, Zhang H (2002) A spatial-temporal approach for video caption detection and recognition. IEEE Trans Neural Networks 13(4):961–971
Article Google Scholar
Wang P, Cai R, Yang S (2003) A hybrid approach to news video classification with multimodal features. in Proc. Int. Conf. on Information, Communication and Signal Processing, 2, 787-791
Wang R, Jin W, Wu L (2004) A novel video caption detection approach using multi-frame integration. ICPR 2004. Proceedings of the 17th International Conference, 1, 449-52
Wang F, Ma Y, Zhang H, Li J (2005) A generic framework for semantic sports video analysis using dynamic bayesian networks. in Proc. Int. Conf. on Multimedia Modeling, 115-121
Wiegand T, Sullivan G, Bjontegaard G, Luthra A (2003) Overview of the H.264/AVC video coding standard. IEEE Tans Circuits Syst Video Technol 13:560–576
Article Google Scholar
Wu W, Chen D, Yang J (2005) Integrating co-training and recognition for text detection. In Proceedings of the International Conference on Multimedia Expo
Wu V, Manmatha R, Riseman E (1999) Textfinder: an automatic system to detect and recognize text in images. IEEE Trans Pattern Anal Mach Intell 21(11):1224–229
Article Google Scholar
Zhang J, Goldgof D, Kasturi R (2008) A new edge-based text verification approach for video. in Proc. ICPR
Zhang H, Wu J, Zhong D, Smoliar S (1997) An integrated system for content-based video retrieval and browsing. Pattern Recognit 30:643–658
Article Google Scholar
Zhong Y, Zhang H, Jain A (2000) Automatic caption localization in compressed video. IEEE Trans Pattern Analysis and Machine Intelligence 22(4):385–392
Article Google Scholar

Download references

Acknowledgements

This work is supported in part by the National Natural Science Foundation of China (NSFC) Project No.60903121, No.61173109, and Foundations of Microsoft Research Asia.

Author information

Authors and Affiliations

Xi’an Jiaotong University, Xi’an, China
Xueming Qian, Huan Wang & Xingsong Hou

Authors

Xueming Qian
View author publications
You can also search for this author in PubMed Google Scholar
Huan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xingsong Hou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xueming Qian.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qian, X., Wang, H. & Hou, X. Video text detection and localization in intra-frames of H.264/AVC compressed video. Multimed Tools Appl 70, 1487–1502 (2014). https://doi.org/10.1007/s11042-012-1168-z

Download citation

Published: 20 July 2012
Issue Date: June 2014
DOI: https://doi.org/10.1007/s11042-012-1168-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Video text detection and localization in intra-frames of H.264/AVC compressed video

Abstract

Access this article

Similar content being viewed by others

Automatic video superimposed text detection based on Nonsubsampled Contourlet Transform

An Effective Approach Towards Video Text Recognition

Robust detection of video text using an efficient hybrid method via key frame extraction and text localization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Video text detection and localization in intra-frames of H.264/AVC compressed video

Abstract

Access this article

Similar content being viewed by others

Automatic video superimposed text detection based on Nonsubsampled Contourlet Transform

An Effective Approach Towards Video Text Recognition

Robust detection of video text using an efficient hybrid method via key frame extraction and text localization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation