Skip to main content
Log in

Video text detection and localization in intra-frames of H.264/AVC compressed video

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Video texts are closely related to the video content. The video text information can facilitate content based video analysis, indexing and retrieval. Video sequences are usually compressed before storage and transmission. A basic step of text-based applications is text detection and localization. In this paper, an overlaid text detection and localization method is proposed for H.264/AVC compressed videos by using the integer discrete cosine transform (DCT) coefficients of intra-frames. The main contributions of this paper are in the following two aspects: 1) coarse text blocks detection using block sizes and quantization parameters adaptive thresholds; 2) text line localization according to the characteristics of text in intra frames of H.264/AVC compressed domain. Comparisons are made with the pixel domain based text detection method for the H.264/AVC compressed video. Text detection results on five H.264/AVC video sequences under various qualities show the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Chen D, Bourlard H, Thiran J (2001) Text identification in complex background using svm. In Proceedings of the International Conference on Computer Vision and Pattern Recognition, 2, 621-626

  2. Crandall D, Kasturi R (2001) Robust detection of stylized text events in digital video. In Proceedings of the International Conference on Document Analysis and Recognition 865-869

  3. Cui Y, Huang Q (1997) Character extraction of license plates from video. In Proceedings of the Conference on Computer Vision and Pattern Recognition 502-507

  4. Ekin A (2006) Local information based overlaid text detection by classifier fusion. In Proc. ICASSP2006, 2, II753-II756.

  5. Gargi U, Antani S, Kasturi R (1998) Indexing text events in digital video databases. In Proc. Int. Conf. Pattern Recognit., 1, 916-918

  6. Gordon S (2003) Simplified Use of 8x8 Transform. Doc. JVT-I022, San Diego, Sept. 2003

  7. INRIA FTP site. ftp://imedia-ftp.inria.fr//MUSCLE-VCD-2007//DB-MPEG1//Movie23.mpg

  8. Jain A, Yu B (1998) Automatic text location in images and video frames. In Proc. ICPR, 1497-1499

  9. Jiang H, Liu G, Qian X, et al. (2008) A fast and efficient text tracking in compressed video. in Proc ISM

  10. Jung K, Kim K, Jain A (2004) Text information extraction in images and video: a survey. Pattern Recognition 37:977–997

    Article  Google Scholar 

  11. JVT Reference Software version 10.2. ftp://ftp.imtc-files.org/jvt-experts/reference_software/

  12. JVT-G050, 2003. Draft ITU-T recommendation and final draft international standard of joint video specification (ITU-T Rec. H.264/ISO/IEC 14486-10 AVC. in Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VECG

  13. Lee C, Jung K, Kim H (2003) Automatic text detection and removal in video sequences. Pattern Recogn Lett 24:2607–2623

    Article  Google Scholar 

  14. Li H, Doermann D, Kia O (2000) Automatic text detection and tracking in digital video. IEEE Trans Image Process 9(1):147–156

    Article  Google Scholar 

  15. Lim Y, Choi S, Lee S (2000) Text extraction in MPEG compressed video for content-based indexing. In Proc. Int. Conf. on Pattern Recognit., 4, 409-412

  16. Liu Z, Sarkar S (2008) Robust outdoor text detection using text intensity and shape features. in Proc ICPR

  17. Lu S, Barner K (2008) Weighted DCT coefficients based text detection. in Proc. ICASSP 1341-1344

  18. Lyu M, Song J, Cai M (2005) A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Trans Circuits and Systems for Video Technology 15(2):243–255

    Article  Google Scholar 

  19. Malvar H et al (2003) Low-complexity transform and quantization in H.264/AVC. IEEE Trans CSVT 13:598–603

    Google Scholar 

  20. Mariano V, Kasturi R (2000) Locating uniform-colored text in video frames. in Proc. 15th Int. Conf. Pattern Recognit., 4, 539-542

  21. Ngo C, Chan C (2005) Video text detection and segmentation for optical character recognition. Multimedia Systems 10(3):261–272

    Article  Google Scholar 

  22. Qi W, Gu L, Jiang H, Chen X, Zhang H (2000) Integrating visual, audio and text analysis for news video. in Proc. Int. Conf. Image Process., 3, 520-523

  23. Qian X, Liu G (2006) Text detection, localization and segmentation in compressed videos. in Proc. ICASSP2006., 2, II385-II388

  24. Qian X, Liu G (2007) Global motion estimation from randomly selected motion vector groups and GM/LM based applications. Signal, Image and Video Processing 4:179–189

    Article  Google Scholar 

  25. Qian X, Liu G, Su R (2006) Effective fades and flashlight detection based on accumulating histogram difference. IEEE Trans Circuits and Systems for Video Technology 16(11):1245–1258

    Article  Google Scholar 

  26. Qian X, Liu G, Wang H, Su R (2007) Text detection, localization and tracking in compressed videos. Signal Processing: Image Communication 22(9):752–768

    Google Scholar 

  27. Rainer L, Axel W (2002) Localizing and segmenting text in images and videos. IEEE Trans Circuits and Systems for Video Technology 12(4):256–267

    Article  Google Scholar 

  28. Sato T, Kanade T (1998) Video OCR: Indexing digital news libraries by recognition of superimposed caption. ICCV Workshop on Image and Video retrieval

  29. Shen B, Sethi I (1996) Direct feature extraction from compressed images. in IS&T SPIE: Storage and Retrieval for Image and Video Databases IV, 2607, 404-417

  30. Shivakumara P, Phan TQ, Tan CL (2009) A robust wavelet transform based technique for video text detection. Int Conf Document Analysis and Recognition, 1285-1289

  31. Snoek C, Worring M (2005) Multimedia event-based video indexing using time intervals. IEEE Trans Multimedia 7(4):638–647

    Article  Google Scholar 

  32. Sun L, Liu G, Qian X, Guo D (2009) A novel text detection and localization method based on corner response. in Proc ICME

  33. Tang X, Gao B, Liu J, Zhang H (2002) A spatial-temporal approach for video caption detection and recognition. IEEE Trans Neural Networks 13(4):961–971

    Article  Google Scholar 

  34. Wang P, Cai R, Yang S (2003) A hybrid approach to news video classification with multimodal features. in Proc. Int. Conf. on Information, Communication and Signal Processing, 2, 787-791

  35. Wang R, Jin W, Wu L (2004) A novel video caption detection approach using multi-frame integration. ICPR 2004. Proceedings of the 17th International Conference, 1, 449-52

  36. Wang F, Ma Y, Zhang H, Li J (2005) A generic framework for semantic sports video analysis using dynamic bayesian networks. in Proc. Int. Conf. on Multimedia Modeling, 115-121

  37. Wiegand T, Sullivan G, Bjontegaard G, Luthra A (2003) Overview of the H.264/AVC video coding standard. IEEE Tans Circuits Syst Video Technol 13:560–576

    Article  Google Scholar 

  38. Wu W, Chen D, Yang J (2005) Integrating co-training and recognition for text detection. In Proceedings of the International Conference on Multimedia Expo

  39. Wu V, Manmatha R, Riseman E (1999) Textfinder: an automatic system to detect and recognize text in images. IEEE Trans Pattern Anal Mach Intell 21(11):1224–229

    Article  Google Scholar 

  40. Zhang J, Goldgof D, Kasturi R (2008) A new edge-based text verification approach for video. in Proc. ICPR

  41. Zhang H, Wu J, Zhong D, Smoliar S (1997) An integrated system for content-based video retrieval and browsing. Pattern Recognit 30:643–658

    Article  Google Scholar 

  42. Zhong Y, Zhang H, Jain A (2000) Automatic caption localization in compressed video. IEEE Trans Pattern Analysis and Machine Intelligence 22(4):385–392

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported in part by the National Natural Science Foundation of China (NSFC) Project No.60903121, No.61173109, and Foundations of Microsoft Research Asia.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xueming Qian.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qian, X., Wang, H. & Hou, X. Video text detection and localization in intra-frames of H.264/AVC compressed video. Multimed Tools Appl 70, 1487–1502 (2014). https://doi.org/10.1007/s11042-012-1168-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-012-1168-z

Keywords

Navigation