Texture feature benchmarking and evaluation for historical document image analysis

Mehri, Maroua; Héroux, Pierre; Gomez-Krämer, Petra; Mullot, Rémy

doi:10.1007/s10032-016-0278-y

Texture feature benchmarking and evaluation for historical document image analysis

Original Paper
Published: 05 January 2017

Volume 20, pages 1–35, (2017)
Cite this article

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Maroua Mehri ORCID: orcid.org/0000-0002-4763-8584¹,
Pierre Héroux¹,
Petra Gomez-Krämer² &
…
Rémy Mullot²

1250 Accesses
30 Citations
Explore all metrics

Abstract

The use of different texture-based methods is pervasive in different subfields and tasks of document image analysis (DIA) and particularly in historical DIA (HDIA). Nevertheless, faced with a large diversity of texture-based methods used for HDIA, few questions arise. Which texture methods are firstly well suited for segmenting graphical contents from textual ones, discriminating various text fonts and scales, and separating different types of graphics? Then, which texture-based method represents a constructive compromise between the performance and the computational cost? Thus, in this article a benchmarking of the most classical and widely used texture-based feature sets has been conducted using a classical texture-based pixel-labeling scheme on a large corpus of historical documents to have satisfactory and clear answers to the above questions. We focus on determining the performance of each texture-based feature set according to the document content. The results reported in this study provide firstly a qualitative measure of which texture-based feature sets are the most appropriate and secondly a useful benchmark in terms of performance and computational cost for current and future research efforts in HDIA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey of historical document image datasets

Article Open access 30 July 2022

A Web-Based System to Assess Texture Analysis Methods and Datasets

Applications and Approaches for Texture Analysis and Their Modern Evolution

Notes

http://gallica.bnf.fr.
http://www.primaresearch.org/datasets.
The DIGIDOC-Texture dataset and its ground truth are temporarily available on http://litis-digidoc.univ-rouen.fr/texture/DIGIDOC-Texture.tar.gz. This dataset is available on request subject to the agreement from the French national library “bibliothèque nationale de France” (BnF).
http://gedigroundtruth.sourceforge.net/.

References

Antonacopoulos, A., Bridson, D., Papadopoulos, C., Pletschacher, S.: A realistic dataset for performance evaluation of document layout analysis. In: International Conference on Document Analysis and Recognition, pp. 296–300 (2009)
Antonacopoulos, A., Clausner, C., Papadopoulos, C., Pletschacher, S.: Historical document layout analysis competition. In: International Conference on Document Analysis and Recognition, pp. 1516–1520 (2011)
Antonacopoulos, A., Clausner, C., Papadopoulos, C., Pletschacher, S.: ICDAR 2013 competition on historical book recognition (HBR 2013). In: International Conference on Document Analysis and Recognition, pp. 1459–1463 (2013)
Antonacopoulos, A., Gatos, B., Bridson, D.: Page segmentation competition. In: International Conference on Document Analysis and Recognition, pp. 1279–1283 (2007)
Asi, A., Cohen, R., Kedem, K., El-Sana, J., Dinstein, I.: A coarse-to-fine approach for layout analysis of ancient manuscripts. In: International Conference on Frontiers in Handwriting Recognition, pp. 140–145 (2014)
Baird, H.S.: Digital libraries and document image analysis. In: International Conference on Document Analysis and Recognition, pp. 2–14 (2003)
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. In: Pattern Analysis and Machine Intelligence, pp. 1798–1828 (2013)
Beyerer, J., León, F.P., Frese, C.: Texture analysis. In: Machine Vision, pp. 649–683 (2016)
Bhowmik, T.K., Kar, M.: Text localization in historical document images with local binary patterns and variance models. In: Lecture Notes in Computer Science—Pattern Recognition and Machine Intelligence, pp. 501–508 (2013)
Bovik, A.C., Clark, M., Geisler, W.S.: Multichannel texture analysis using localized spatial filters. In: Pattern Analysis and Machine Intelligence, pp. 55–73 (1990)
Busch, A., Boles, W.W., Sridharan, S.: Texture for script identification. In: Pattern Analysis and Machine Intelligence, pp. 1720–1732 (2005)
Campbell, F.W., Robson, J.G.: Application of Fourier analysis to the visibility of gratings. J. Physiol. 197, 551–566 (1968)
Article Google Scholar
Chen, C.H., Pau, L.F., Wang, P.: Texture Analysis in the Handbook of Pattern Recognition and Computer Vision, 2nd edn. World Scientific, Singapore (1998)
Google Scholar
Chen, J., Cao, H., Prasad, R., Bhardwaj, A., Natarajan, P.: Gabor features for offline Arabic handwriting recognition. In: International Workshop on Document Analysis Systems, pp. 53–58 (2010)
Chen, K., Wei, H., Hennebert, J., Ingold, R., Liwicki, M.: Page segmentation for historical handwritten document images using color and texture features. In: International Conference on Frontiers in Handwriting Recognition, pp. 488–493 (2014)
Chen, K., Wei, H., Liwicki, M., Hennebert, J., Ingold, R.: Robust text line segmentation for historical manuscript images using color and texture. In: International Conference on Pattern Recognition, pp. 2978–2983 (2014)
Cinque, L., Lombardi, L., Manzini, G.: A multiresolution approach for page segmentation. In: Pattern Recognition Letters, pp. 217–225 (1998)
Coppi, D., Grana, C., Cucchiara, R.: Illustrations segmentation in digitized documents using local correlation features. Procedia Comput. Sci. 38, 76–83 (2014)
Article Google Scholar
Cote, M., Albu, A.B.: Texture sparseness for pixel classification of business document images. Int. J. Doc. Anal. Recognit. 17, 257–273 (2014)
Article Google Scholar
Coustaty, M., Raveaux, R., Ogier, J.M.: Historical document analysis: a review of French projects and open issues. In: European Signal Processing Conference, pp. 1445–1449 (2011)
Cruz-Fernández, F., Ramos-Terrades, O.: Document segmentation using relative location features. In: International Conference on Pattern Recognition, pp. 1562–1565 (2012)
Daugman, J.G.: Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. J. Optical Soc. Am. A 2, 1160–1169 (1985)
Article Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7, 1–30 (2006)
MathSciNet MATH Google Scholar
DuBuf, J.M.H., Kardan, M., Spann, M.: Texture feature performance for image segmentation. Pattern Recognit. 23, 291–309 (1990)
Article Google Scholar
Eglin, V., Bres, S., Rivero, C.: Hermite and Gabor transforms for noise reduction and handwriting classification in ancient manuscripts. Int. J. Doc. Anal. Recognit. 9, 101–122 (2007)
Article Google Scholar
Ferrer, M.A., Morales, A., Pal, U.: LBP based line-wise script identification. In: International Conference on Document Analysis and Recognition, pp. 369–373 (2013)
Gabor, D.: Theory of communication. Part 1: the analysis of information. J. Inst. Electr. Eng. Part III Radio Commun. Eng. 93, 429–441 (1946)
Google Scholar
Galloway, M.M.: Texture analysis using gray level run lengths. Comput. Graph. Image Process. 4, 172–179 (1975)
Article Google Scholar
Garz, A., Sablatnig, R.: Multi-scale texture-based text recognition in ancient manuscripts. In: International Conference on Virtual Systems and Multimedia, pp. 336–339 (2010)
Gatos, B., Pratikakis, I., Perantonis, S.J.: Adaptive degraded document image binarization. Pattern Recognit 39, 317–327 (2006)
Article MATH Google Scholar
Grana, C., Serra, G., Manfredi, M., Coppi, D., Cucchiara, R.: Layout analysis and content enrichment of digitized books. Multimedia Tools Appl. 75, 1–22 (2014)
Google Scholar
Haralick, R.M.: Statistical and structural approaches to texture. In: Proceedings of the IEEE, pp. 786–804 (1979)
Haralick, R.M., Shanmugam, K., Dinstein, I.: Textural features for image classification. Syst. Man Cybern. 3, 610–621 (1973)
Article Google Scholar
Harwood, D., Ojala, T., Pietikäinen, M., Kelman, S., Davis, L.: Texture classification by center-symmetric auto-correlation, using Kullback discrimination of distributions. Pattern Recognit. Lett. 16, 971–987 (1995)
He, J., Do, Q.D.M., Downton, A.C., Kim, J.H.: A comparison of binarization methods for historical archive documents. In: International Conference on Document Analysis and Recognition, pp. 538–542 (2005)
Hebert, D., Paquet, T., Nicolas, S.: Continuous CRF with multi-scale quantization feature functions application to structure extraction in old newspaper. In: International Conference on Document Analysis and Recognition, pp. 493–497 (2011)
Jain, A.K., Duin, R.P.W., Mao, J.: Statistical pattern recognition: a review. Pattern Anal. Mach. Intell. 22, 4–37 (2000)
Article Google Scholar
Jain, A.K., Farrokhnia, F.: Unsupervised texture segmentation using Gabor filters. Pattern Recognit. 24, 1167–1186 (1991)
Article Google Scholar
Jain, A.K., Zhong, Y.: Page segmentation using texture analysis. Pattern Recognit. 29, 743–770 (1996)
Article Google Scholar
Journet, N., Ramel, J., Mullot, R., Eglin, V.: Document image characterization using a multiresolution analysis of the texture: application to old documents. Int. J. Doc. Anal. Recognit. 11, 9–18 (2008)
Article Google Scholar
Keysers, D., Shafait, F., Breuel, T.M.: Document image zone classification—a simple high-performance approach. In: International Conference on Computer Vision Theory and Applications, pp. 44–51 (2007)
Kise, K.: Page segmentation techniques in document analysis. In: Handbook of Document Image Processing and Recognition, pp. 135–175 (2014)
Kricha, A., Amara, N.E.B.: Exploring textural analysis for historical documents characterization. J. Comput. 3, 24–30 (2011)
Kumar, S., Gupta, R., Khanna, N., Chaudhury, S., Joshi, S.D.: Text extraction and document image segmentation using matched wavelets and MRF model. IEEE Trans. Image Process. 16, 2117–2128 (2007)
Article MathSciNet Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning, pp. 282–289 (2001)
Lance, G.N., Williams, W.T.: A general theory of classificatory sorting strategies 1. Hierarchical systems. Comput. J. 9, 373–380 (1967)
Li, J., Gray, R.M.: Context-based multiscale classification of document images using wavelet coefficient distributions. IEEE Trans. Image Process. 9, 1604–1616 (2000)
Article Google Scholar
Lin, M., Tapamo, J., Ndovie, B.: A texture-based method for document segmentation and classification. South African Comput. J. 36, 49–56 (2006)
Mallat, S.G.: A theory for multiresolution signal decomposition: the wavelet representation. Pattern Anal. Mach. Intell. 11, 674–693 (1989)
Article MATH Google Scholar
Mehri, M., Gomez-Krämer, P., Héroux, P., Boucher, A., Mullot, R.: Texture feature evaluation for segmentation of historical document images. In: International Workshop on Historical Document Imaging and Processing, pp. 102–109 (2013)
Mehri, M., Gomez-Krämer, P., Héroux, P., Boucher, A., Mullot, R.: A texture-based pixel labeling approach for historical books. Pattern Anal. Appl. 1–40 (2015)
Mehri, M., Kieu, V.C., Mhiri, M., Héroux, P., Gomez-Krämer, P., Mahjoub, M.A., Mullot, R.: Robustness assessment of texture features for the segmentation of ancient documents. In: International Workshop on Document Analysis Systems, pp. 293–297 (2014)
Mehri, M., Mhiri, M., Héroux, P., Gomez-Krämer, P., Mahjoub, M.A., Mullot, R.: Performance evaluation and benchmarking of six texture-based feature sets for segmenting historical documents. In: International Conference on Pattern Recognition, pp. 2885–2890 (2014)
Mikkilineni, A.K., Chiang, P.J., Ali, G.N., Chiu, G.T.C., Allebach, J.P., III, E.J.D.: Printer identification based on graylevel co-occurrence features for security and forensic applications. In: Security, Steganography, and Watermarking of Multimedia Contents VII, pp. 430–440 (2005)
Mouats, K., Journet, N., Mullot, R.: Segmentation floue d’images de documents anciens par approche texture utilisant le filtre de Gabor. In: International Conference on Image and Signal Processing (2006)
Nguyen, G., Coustaty, M., Ogier, J.M.: Stroke feature extraction for lettrine indexing. In: International Conference on Image Processing Theory Tools and Applications, pp. 355–360 (2010)
Nicolaou, A., Slimane, F., Märgner, V., Liwicki, M.: Local binary patterns for Arabic optical font recognition. In: International Workshop on Document Analysis Systems, pp. 76–80 (2014)
Nikolaou, N., Makridis, M., Gatos, B., Stamatopoulos, N., Papamarkos, N.: Segmentation of historical machine-printed documents using adaptive run-length smoothing and skeleton segmentation paths. Image Vis. Comput. 28, 590–604 (2010)
Article Google Scholar
Ojala, T., Pietikäinen, M., Mäenpää, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Pattern Anal. Mach. Intell. 24, 971–987 (2002)
Okun, O., Pietikäinen, M.: A survey of texture-based methods for document layout analysis. In: Workshop on Texture Analysis in Machine Vision, pp. 137–148 (1999)
Otsu, N.: A threshold selection method from gray-level histograms. Syst. Man Cybern. 9, 62–66 (1979)
Petrou, M., Sevilla, P.G.: Image Processing: Dealing with Texture. Wiley, Boston (2006)
Book Google Scholar
Romero, V., Fornés, A., Serrano, N., Sánchez, J.A., Toselli, A.H., Frinken, V., Vidal, E., Lladós, J.: The ESPOSALLES database: an ancient marriage license corpus for off-line handwriting recognition. Pattern Recognit. 46, 1658–1669 (2013)
Article Google Scholar
Sauvola, J., Pietikäinen, M.: Adaptive document image binarization. Pattern Recognit. 33, 225–236 (2000)
Article Google Scholar
Serrano, N., Castro, F., Juan, A.: The RODRIGO database. In: International Conference on Language Resources and Evaluation, pp. 2709–2712 (2010)
Seuret, M., Liwicki, M., Ingold, R.: Pixel level handwritten and printed content discrimination in scanned documents. In: International Conference on Frontiers in Handwriting Recognition, pp. 423–428 (2014)
Shafait, F., Keysers, D., Breuel, T.M.: Performance evaluation and benchmarking of six-page segmentation algorithms. Pattern Anal. Mach. Intell. 30, 941–954 (2008)
Article Google Scholar
Tamura, H., Mori, S., Yamawaki, T.: Textural features corresponding to visual perception. Syst. Man Cybern. 8, 460–473 (1978)
Article Google Scholar
Tang, X.: Texture information in run-length matrices. Image Process. 7, 1602–1609 (1998)
Article Google Scholar
Tuceryan, M., Jain, A.K.: Texture segmentation using Voronoi polygons. Pattern Anal. Mach. Intell. 12, 211–216 (1990)
Article Google Scholar
Uttama, S., Loonis, P., Delalandre, M., Ogier, J.M.: Segmentation and retrieval of ancient graphic documents. In: International Workshop on Graphics Recognition, pp. 88–98 (2006)
Villegas, M., Romero, V., Sánchez, J.A.: On the modification of binarization algorithms to retain grayscale information for handwritten text recognition. In: Iberian Conference on Pattern Recognition and Image Analysis, pp. 208–215 (2015)
Wang, D., Srihari, S.N.: Page segmentation and classification. Comput Vis. Graph. Image Process. 54, 327–352 (1989)
Article Google Scholar
Wang, L., He, D.C.: Texture classification using texture spectrum. Pattern Recognit. 23, 905–910 (1990)
Article Google Scholar
Wechsler, H.: Texture analysis–a survey. Signal Process. 2, 271–282 (1980)
Article Google Scholar
Weszka, J.S., Dyer, C.R., Rosenfeld, A.: A comparative study of texture measures for terrain classification. Syst. Man Cybern. 6, 269–285 (1976)
Article MATH Google Scholar
Zhu, Y., Tan, T., Wang, Y.: Font recognition based on global texture analysis. Pattern Anal. Mach. Intell. 23, 1192–1200 (2001)
Article Google Scholar
Tuceryan, M.: Moment based texture segmentation. Pattern Recognit. Lett. 15, 659–668 (1994)
Article Google Scholar

Download references

Acknowledgements

This study was supported by the French national research agency (ANR), under Grant ANR-10-CORD-0020, which is gratefully acknowledged. The authors would like also to thank Geneviève Cron and Christos Papadopoulos for providing access to the Gallica digital library\(^{1}\) and IMPACT dataset\(^{2}\), respectively.

Author information

Authors and Affiliations

Normandie Univ, UNIROUEN,UNIHAVRE, INSA Rouen, LITIS, 76000, Rouen, France
Maroua Mehri & Pierre Héroux
L3i EA 2118, University of La Rochelle, Avenue Michel Crépeau, 17042, La Rochelle, France
Petra Gomez-Krämer & Rémy Mullot

Authors

Maroua Mehri
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Héroux
View author publications
You can also search for this author in PubMed Google Scholar
Petra Gomez-Krämer
View author publications
You can also search for this author in PubMed Google Scholar
Rémy Mullot
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maroua Mehri.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mehri, M., Héroux, P., Gomez-Krämer, P. et al. Texture feature benchmarking and evaluation for historical document image analysis. IJDAR 20, 1–35 (2017). https://doi.org/10.1007/s10032-016-0278-y

Download citation

Received: 13 February 2016
Revised: 28 September 2016
Accepted: 19 December 2016
Published: 05 January 2017
Issue Date: March 2017
DOI: https://doi.org/10.1007/s10032-016-0278-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Texture feature benchmarking and evaluation for historical document image analysis

Abstract

Access this article

Similar content being viewed by others

A survey of historical document image datasets

A Web-Based System to Assess Texture Analysis Methods and Datasets

Applications and Approaches for Texture Analysis and Their Modern Evolution

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Texture feature benchmarking and evaluation for historical document image analysis

Abstract

Access this article

Similar content being viewed by others

A survey of historical document image datasets

A Web-Based System to Assess Texture Analysis Methods and Datasets

Applications and Approaches for Texture Analysis and Their Modern Evolution

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation