survey

Document Layout Analysis: A Comprehensive Survey

Authors:
Galal M. Binmakhashen

King Fahd University of Petroleum and Minerals, Kingdom of Saudi Arabia

King Fahd University of Petroleum and Minerals, Kingdom of Saudi Arabia

0000-0002-5111-9760
View Profile

,
Sabri A. Mahmoud

King Fahd University of Petroleum and Minerals, Kingdom of Saudi Arabia

King Fahd University of Petroleum and Minerals, Kingdom of Saudi Arabia

0000-0002-5432-3206
View Profile

Authors Info & Claims

ACM Computing Surveys Volume 52 Issue 6Article No.: 109pp 1–36https://doi.org/10.1145/3355610

Published:16 October 2019Publication History

ACM Computing Surveys

Abstract

Document layout analysis (DLA) is a preprocessing step of document understanding systems. It is responsible for detecting and annotating the physical structure of documents. DLA has several important applications such as document retrieval, content categorization, text recognition, and the like. The objective of DLA is to ease the subsequent analysis/recognition phases by identifying the document-homogeneous blocks and by determining their relationships. The DLA pipeline consists of several phases that could vary among DLA methods, depending on the documents’ layouts and final analysis objectives. In this regard, a universal DLA algorithm that fits all types of document-layouts or that satisfies all analysis objectives has not been developed, yet. In this survey paper, we present a critical study of different document layout analysis techniques. The study highlights the motivational reasons for pursuing DLA and discusses comprehensively the different phases of the DLA algorithms based on a general framework that is formed as an outcome of reviewing the research in the field. The DLA framework consists of preprocessing, layout analysis strategies, post-processing, and performance evaluation phases. Overall, the article delivers an essential baseline for pursuing further research in document layout analysis.

References

Mudit Agrawal and David Doermann. 2009. Voronoi++: A dynamic page segmentation approach based on Voronoi and Docstrum features. In The International Conference on Document Analysis and Recognition. IEEE, 1011--1015.Google ScholarDigital Library
Mudit Agrawal and David Doermann. 2010. Context-aware and content-based dynamic Voronoi page segmentation. In The 8th IAPR International Workshop on Document Analysis Systems. ACM Press, New York, 73--80.Google ScholarDigital Library
Prakash K. Aithal, G. Rajesh, Dinesh U. Acharya, and P. C. Siddalingaswamy. 2013. A fast and novel skew estimation approach using radon transform. International Journal of Computer Information Systems and Industrial Management Applications 5 (2013), 337--344.Google Scholar
Alireza Alaei, Umapada Pal, and P. Nagabhushan. 2011. A new scheme for unconstrained handwritten text-line segmentation. Pattern Recognition 44, 4 (2011), 917--928.Google ScholarDigital Library
Michele Alberti, Mathias Seuret, Vinaychandran Pondenkandath, Rolf Ingold, and Marcus Liwicki. 2017. Historical document image segmentation with LDA-initialized deep neural networks. In The 4th International Workshop on Historical Document Imaging and Processing. ACM, 95--100.Google ScholarDigital Library
Adnan Amin and Sue Wu. 2005. A robust system for thresholding and skew detection in mixed text/graphics documents. International Journal of Image and Graphics 5, 2 (Apr. 2005), 247--265.Google ScholarCross Ref
Khalid M. Amin, Mohamed Abd Elfattah, Aboul Ella Hassanien, and Gerald Schaefer. 2014. A binarization algorithm for historical Arabic manuscript images using a neutrosophic approach. In The 9th International Conference on Computer Engineering 8 Systems. IEEE, 266--270.Google ScholarCross Ref
A. Antonacopoulos, C. Clausner, C. Papadopoulos, and S. Pletschacher. 2011. Historical document layout analysis competition. In International Conference on Document Analysis and Recognition. IEEE, 1516--1520.Google Scholar
A. Antonacopoulos, B. Gatos, and D. Karatzas. 2003. ICDAR 2003 page segmentation competition. In The 7th International Conference on Document Analysis and Recognition. 688--692.Google Scholar
A. Antonacopoulos and R. T. Ritchings. 1995. Representation and classification of complex-shaped printed regions using white tiles. In The 3rd International Conference on Document Analysis and Recognition, Vol. 2. IEEE Comput. Soc. Press, 1132--1135.Google Scholar
A. Antonacopoulos, S. Pletschacher, D. Bridson, and C. Papadopoulos. 2009. ICDAR2009 page segmentation competition. In The 10th International Conference on Document Analysis and Recognition. 1370--1374.Google Scholar
Apostolos Antonacopoulos and David Bridson. 2007. Performance analysis framework for layout analysis methods. In The 9th International Conference on Document Analysis and Recognition (ICDAR), Vol. 2. IEEE, 1258--1262.Google ScholarCross Ref
Apostolos Antonacopoulos, David Bridson, Christos Papadopoulos, and Stefan Pletschacher. 2009. A realistic dataset for performance evaluation of document layout analysis. In The 10th International Conference on Document Analysis and Recognition. IEEE, 296--300. DOI:https://doi.org/10.1109/ICDAR.2009.271Google ScholarDigital Library
Apostolos Antonacopoulos, Christian Clausner, Christos Papadopoulos, and Stefan Pletschacher. 2013. ICDAR2013 competition on historical newspaper layout analysis (HNLA’13). In The 12th International Conference on Document Analysis and Recognition. IEEE, 1454--1458.Google Scholar
Apostolos Antonacopoulos, Christian Clausner, Christos Papadopoulos, and Stefan Pletschacher. 2015. ICDAR2015 competition on recognition of documents with complex layouts. In The 13th International Conference on Document Analysis and Recognition. IEEE, 1151--1155.Google Scholar
Manivannan Arivazhagan, Harish Srinivasan, and Sargur Srihari. 2007. A statistical approach to line segmentation in handwritten documents. In Document Recognition and Retrieval XIV, Xiaofan Lin and Berrin A. Yanikoglu (Eds.). International Society for Optics and Photonics, 65000T.Google Scholar
Nikolaos Arvanitopoulos and Sabine Susstrunk. 2014. Seam carving for text line extraction on color and grayscale historical manuscripts. In The 14th International Conference on Frontiers in Handwriting Recognition. IEEE, 726--731.Google ScholarCross Ref
Abedelkadir Asi, Rafi Cohen, Klara Kedem, and Jihad El-Sana. 2015. Simplifying the reading of historical manuscripts. In The 13th International Conference on Document Analysis and Recognition. IEEE, 826--830. DOI:https://doi.org/10.1109/ICDAR.2015.7333877Google ScholarDigital Library
Abedelkadir Asi, Rafi Cohen, Klara Kedem, Jihad El-Sana, and Itshak Dinstein. 2014. A coarse-to-fine approach for layout analysis of ancient manuscripts. In The 14th International Conference on Frontiers in Handwriting Recognition. 140--145.Google ScholarCross Ref
Abedelkadir Asi, Raid Saabni, and Jihad El-Sana. 2011. Text line segmentation for gray scale historical document images. In The Workshop on Historical Document Imaging and Processing. ACM Press, New York, 120.Google ScholarDigital Library
Bruno Tenório Ávila and Rafael Dueire Lins. 2005. A fast orientation and skew detection algorithm for monochromatic document images. In The ACM Symposium on Document Engineering. ACM Press, New York, 118.Google Scholar
Micheal Baechler, Marcus Liwicki, and Rolf Ingold. 2013. Text line extraction using DMLP classifiers for historical manuscripts. In The 12th International Conference on Document Analysis and Recognition. IEEE, 1029--1033. DOI:https://doi.org/10.1109/ICDAR.2013.206Google ScholarDigital Library
A. Bagdanov and J. Kanai. 1997. Projection profile based skew estimation algorithm for JBIG compressed images. In The 4th International Conference on Document Analysis and Recognition, Vol. 1. IEEE Comput. Soc., 401--405. DOI:https://doi.org/10.1109/ICDAR.1997.619878Google Scholar
Itay Bar-Yosef, Nate Hagbi, Klara Kedem, and Itshak Dinstein. 2009. Line segmentation for degraded handwritten historical documents. In The 10th International Conference on Document Analysis and Recognition. IEEE, 1161--1165. http://ieeexplore.ieee.org/document/5277595/.Google ScholarDigital Library
P. Barlas, S. Adam, C. Chatelain, and T. Paquet. 2014. A typed and handwritten text block segmentation system for heterogeneous and complex documents. In The 11th IAPR International Workshop on Document Analysis Systems. IEEE, 46--50.Google Scholar
J. Bernsen. 1986. Dynamic thresholding of gray level images. In The International Conference on Pattern Recognition. 1251--1255.Google Scholar
Fadi Biadsy, Jihad El-Sana, and Nizar Habash. 2006. Online Arabic handwriting recognition using hidden Markov models. In The 10th International Workshop on Frontiers in Handwriting Recognition. Suvisoft.Google Scholar
Thomas M. Breuel. 2003. High performance document layout analysis. In Symposium on Document Image Understanding Technology 3 (2003), 209--218.Google Scholar
D. Bridson and A. Antonacopoulos. 2008. A geometric approach for accurate and efficient performance evaluation of layout analysis methods. In The 19th International Conference on Pattern Recognition. IEEE, 1--4.Google Scholar
Syed Saqib Bukhari, Mayce Ibrahim Ali Al Azawi, Faisal Shafait, and Thomas M. Breuel. 2010. Document image segmentation using discriminative learning over connected components. In The 8th IAPR International Workshop on Document Analysis Systems. ACM Press, New York, 183--190.Google Scholar
Syed Saqib Bukhari, T. M. Breuel, Abedelkadir Asi, and Jihad El-Sana. 2012. Layout analysis for arabic historical document images using machine learning. In The International Conference on Frontiers in Handwriting Recognition. IEEE, 639--644.Google ScholarDigital Library
Syed Saqib Bukhari, Faisal Shafait, and Thomas M. Breuel. 2009. Script-independent handwritten textlines segmentation using active contours. In The 10th International Conference on Document Analysis and Recognition. IEEE, 446--450. http://ieeexplore.ieee.org/document/5277636/.Google Scholar
Syed Saqib Bukhari, Faisal Shafait, and Thomas M. Breuel. 2011. Improved document image segmentation algorithm using multiresolution morphology. In International Society for Optics and Photonics, Gady Agam and Christian Viard-Gaudin (Eds.). International Society for Optics and Photonics, 78740D.Google Scholar
Marius Bulacu, Rutger Van Koert, Lambert Schomaker, and Tijn van der Zant. 2007. Layout analysis of handwritten historical documents for searching the archive of the cabinet of the Dutch Queen. In The 9th International Conference on Document Analysis and Recognition. IEEE, 351--361.Google Scholar
Mark J. Burge and Gladys Monagan. 1995. Using the Voronoi tessellation for grouping words and multipart symbols in documents. In The SPIE International Symposium on Optics, Imaging and Instrumentation, Robert A. Melter, Angela Y. Wu, Fred L. Bookstein, and William D. K. Green (Eds.). International Society for Optics and Photonics, 116--124.Google Scholar
C. Clausner A. Antonacopoulos C. Papadopoulos, S. Pletschacher. 2013. The IMPACT dataset of historical document images. In The 2nd International Workshop on Historical Document Imaging and Processing. 123--130.Google ScholarDigital Library
Yang Cao, Shuhua Wang, and Heng Li. 2003. Skew detection and correction in document images based on straight-line fitting. Pattern Recognition Letters 24, 12 (2003), 1871--1879.Google ScholarDigital Library
Samuele Capobianco, Leonardo Scommegna, and Simone Marinai. 2018. Historical handwritten document segmentation by using a weighted loss. In IAPR Workshop on Artificial Neural Networks in Pattern Recognition. Springer, 395--406.Google ScholarCross Ref
R. Cattoni, T. Coianiz, S. Messelodi, and Cm Modena. 1998. Geometric layout analysis techniques for document image understanding: A review. ITC-First Technical Report (1998), 1--68.Google Scholar
F. Cesarini, M. Gori, S. Marinai, and G. Soda. 1999. Structured document segmentation and representation by the modified X-Y tree. In The 5th International Conference on Document Analysis and Recognition. IEEE, 563--566.Google Scholar
Nabendu Chaki, Soharab Hossain Shaikh, and Khalid Saeed. 2014. A comprehensive survey on image binarization techniques. In Exploring Image Binarization Techniques. Springer India, 5--15.Google Scholar
Kai Chen, Cheng-Lin Liu, Mathias Seuret, Marcus Liwicki, Jean Hennebert, and Rolf Ingold. 2016. Page segmentation for historical document images based on superpixel classification with unsupervised feature learning. In The 12th IAPR Workshop on Document Analysis Systems (DAS). IEEE, 299--304.Google ScholarCross Ref
Kai Chen, Mathias Seuret, Jean Hennebert, and Rolf Ingold. 2017. Convolutional neural networks for page segmentation of historical document images. In The 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). IEEE, 965--970.Google ScholarCross Ref
Kai Chen, Mathias Seuret, Marcus Liwicki, Jean Hennebert, and Rolf Ingold. 2015. Page segmentation of historical document images with convolutional autoencoders. In The 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE, 1011--1015.Google ScholarDigital Library
Kai Chen, Hao Wei, Jean Hennebert, Rolf Ingold, and Marcus Liwicki. 2014. Page segmentation for historical handwritten document images using color and texture features. In The 14th International Conference on Frontiers in Handwriting Recognition. 488--493.Google ScholarCross Ref
Yiping Chen and Liansheng Wang. 2017. Broken and degraded document images binarization. Neurocomputing 237 (2017), 272--280.Google ScholarDigital Library
Atul K. Chhabra and Ihsin T. Phillips. 1997. The second international graphics recognition contest-raster to vector conversion: A report. In International Workshop on Graphics Recognition. Springer, 390--410.Google Scholar
Rafi Cohen, Abedelkadir Asi, Klara Kedem, Jihad El-Sana, and Itshak Dinstein. 2013. Robust text and drawing segmentation algorithm for historical documents. In The 2nd International Workshop on Historical Document Imaging and Processing. 110--117.Google ScholarDigital Library
Rafi Cohen, Itshak Dinstein, Jihad El-Sana, and Klara Kedem. 2014. Using scale-space anisotropic smoothing for text line extraction in historical documents. In International Conference Image Analysis and Recognition. Springer International Publishing, 349--358.Google ScholarCross Ref
Laboratoire National de metrologie et d’Essais (LNE). 2013. MAURDOR campaign. http://www.maurdor-campaign.org/index.php?id=83&L==1.Google Scholar
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition. 248--255.Google ScholarCross Ref
Markus Diem, Florian Kleber, and Robert Sablatnig. 2011. Text classification and document layout analysis of paper fragments. In The International Conference on Document Analysis and Recognition. IEEE, 854--858. DOI:https://doi.org/10.1109/ICDAR.2011.175Google ScholarDigital Library
Markus Diem, Florian Kleber, and Robert Sablatnig. 2012. Skew estimation of sparsely inscribed document fragments. In The 10th IAPR International Workshop on Document Analysis Systems. IEEE, 292--296.Google ScholarDigital Library
David Doermann Elena Zotkina, Himanshu Suri. 2013. GEDI: Groundtruthing Environment for Document Images. https://lampsrv02.umiacs.umd.edu/projdb/project.php?id=53.Google Scholar
Boris Epshtein. 2011. Determining document skew using inter-line spaces. In The International Conference on Document Analysis and Recognition. IEEE, 27--31. DOI:https://doi.org/10.1109/ICDAR.2011.15Google ScholarDigital Library
Sébastien Eskenazi, Petra Gomez-Krämer, and Jean-Marc Ogier. 2015. The Delaunay document layout descriptor. In ACM Symposium on Document Engineering. ACM Press, New York, 167--175.Google ScholarDigital Library
Sébastien Eskenazi, Petra Gomez-Krämer, and Jean-Marc Ogier. 2017. A comprehensive survey of mostly textual document segmentation algorithms since 2008. Pattern Recognition 64 (2017), 1--14.Google ScholarDigital Library
Jonathan Fabrizio. 2014. A precise skew estimation algorithm for document images using KNN clustering and Fourier transform. In The International Conference on Image Processing. IEEE, 2585--2588.Google ScholarCross Ref
Andreas Fischer, Micheal Baechler, Angelika Garz, Marcus Liwicki, and Rolf Ingold. 2014. A combined system for text line extraction and handwriting recognition in historical documents. In The 11th IAPR International Workshop on Document Analysis Systems. 71--75.Google ScholarDigital Library
Andreas Fischer, Volkmar Frinken, Alicia Fornés, and Horst Bunke. 2011. Transcription alignment of latin manuscripts using hidden Markov models. In The Workshop on Historical Document Imaging and Processing. ACM, 29--36.Google ScholarDigital Library
Gaofeng Meng, Chunhong Pan, Nanning Zheng, and Chen Sun. 2010. Skew estimation of document images using bagging. IEEE Transactions on Image Processing 19, 7 (jul 2010), 1837--1846.Google Scholar
Angelika Garz, Markus Diem, and Robert Sablatnig. 2010. Detecting text areas and decorative elements in ancient manuscripts. In The 12th International Conference on Frontiers in Handwriting Recognition. IEEE, 176--181. DOI:https://doi.org/10.1109/ICFHR.2010.35Google ScholarDigital Library
Angelika Garz, Andreas Fischer, Robert Sablatnig, and Horst Bunke. 2012. Binarization-free text line segmentation for historical documents based on interest point clustering. In The 10th IAPR International Workshop on Document Analysis Systems. IEEE, 95--99.Google ScholarDigital Library
Angelika Garz and Robert Sablatnig. 2010. Multi-scale texture-based text recognition in ancient manuscripts. In The 16th International Conference on Virtual Systems and Multimedia. IEEE, 336--339. DOI:https://doi.org/10.1109/VSMM.2010.5665938Google ScholarCross Ref
Angelika Garz, Robert Sablatnig, and Markus Diem. 2011. Layout analysis for historical manuscripts using SIFT features. In The International Conference on Document Analysis and Recognition. 508--512.Google ScholarDigital Library
B. Gatos, N. Papamarkos, and C. Chamzas. 1997. Skew detection and text line position determination in digitized documents. Pattern Recognition 30, 9 (1997), 1505--1519.Google ScholarCross Ref
B. Gatos, N. Stamatopoulos, and G. Louloudis. 2011. ICDAR2009 handwriting segmentation contest. International Journal on Document Analysis and Recognition (IJDAR) 14, 1 (2011), 25--33.Google ScholarDigital Library
Basilios Gatos, Pratikakis Ioannis, and Stavros J. Perantonis. 2004. An adaptive binarization technique for low quality historical documents. In Document Analysis Systems VI. Springer, Springer Berlin, 102--113.Google Scholar
Basilis Gatos, Nikolaos Stamatopoulos, and Georgios Louloudis. 2010. ICFHR2010 handwriting segmentation contest. In The 12th International Conference on Frontiers in Handwriting Recognition. IEEE, 737--742.Google Scholar
Tobias Grüning, Gundram Leifert, Tobias Strauß, and Roger Labahn. 2018. A two-stage method for text line detection in historical documents. arXiv preprint arXiv:1802.03345 (2018).Google Scholar
Karim Hadjar and Rolf Ingold. 2004. Physical layout analysis of complex structured arabic documents using artificial neural nets. In Lecture Notes in Computer Science. Springer Berlin, 170--178.Google Scholar
Sheng He and Lambert Schomaker. 2019. DeepOtsu: Document enhancement and binarization using iterative deep learning. Pattern Recognition 91 (2019), 379--390.Google ScholarDigital Library
S. C. Hinds, J. L. Fisher, and D. P. D’Amato. 1990. A document skew detection method using run-length encoding and the hough transform. In The 10th International Conference on Pattern Recognition, Vol. I. IEEE Comput. Soc. Press, 464--468.Google Scholar
Jaekyu Ha, R. M. Haralick, and I. T. Phillips. 1995. Document page decomposition by the bounding-box project. In The 3rd International Conference on Document Analysis and Recognition. IEEE Comput. Soc. Press, 1119--1122.Google Scholar
Anil K. Jain and Yu Zhong. 1996. Page segmentation using texture analysis. Pattern Recognition 29, 5 (May 1996), 743--770.Google ScholarDigital Library
N. Journet, V. Eglin, J. Y. Ramel, and R. Mullot. 2005. Text/graphic labelling of ancient printed documents. In The 8th International Conference on Document Analysis and Recognition. IEEE, 1010--1014 Vol. 2. DOI:https://doi.org/10.1109/ICDAR.2005.235Google Scholar
Nicholas Journet, Jean-Yves Ramel, Rémy Mullot, and Véronique Eglin. 2008. Document image characterization using a multiresolution analysis of the texture: Application to old documents. International Journal of Document Analysis and Recognition (IJDAR) 11, 1 (Jun 2008), 9--18.Google ScholarDigital Library
Hao Wei Marcus Liwicki Rolf Ingold Kai Chen, Mathias Seuret. 2015. Document, image, and video analysis DLA tool. http://diuf.unifr.ch/main/hisdoc/divadia.Google Scholar
Rangachar Kasturi, Lawrence O’Gorman, and Venu Govindaraju. 2002. Document image analysis: A primer. Sadhana 27, 1 (2002), 3--22.Google ScholarCross Ref
N. Khorissi, A. Namane, A. Mellit, F. Abdati, Z. A. Bensalama, and A. Guessoum. 2007. Application of the wavelet and the Hough transform for detecting the skew angle in arabic printed documents. In The 9th International Symposium on Signal Processing and Its Applications. IEEE, 1--4. http://ieeexplore.ieee.org/document/4555586/.Google Scholar
Koichi Kise, Akinori Sato, and Motoi Iwata. 1998. Segmentation of page images using the area Voronoi diagram. Computer Vision and Image Understanding 70, 3 (1998), 370--382.Google ScholarDigital Library
Koichi Kise. 2014. Page segmentation techniques in document analysis. In Handbook of Document Image Processing and Recognition. Springer London, London, 135--175.Google Scholar
K. Kise, A. Sato, and K. Matsumoto. 1997. Document image segmentation as selection of Voronoi edges. In The Workshop on Document Image Analysis. IEEE Comput. Soc, 32--39. DOI:https://doi.org/10.1109/DIA.1997.627089Google Scholar
Florian Kleber, Robert Sablatnig, Melanie Gau, and Heinz Miklas. 2008. Ancient document analysis based on text line extraction. In The 19th International Conference on Pattern Recognition. IEEE, 1--4. DOI:https://doi.org/10.1109/ICPR.2008.4761530Google ScholarCross Ref
M. Krishnamoorthy, G. Nagy, S. Seth, and M. Viswanathan. 1993. Syntactic segmentation and labeling of digitized pages from technical journals. IEEE Transactions on Pattern Analysis and Machine Intelligence 15, 7 (1993), 737--747.Google ScholarDigital Library
Victor Lavrenko, Toni M. Rath, and Raghavan Manmatha. 2004. Holistic word recognition for handwritten historical documents. In The 1st International Workshop on Document Image Analysis for Libraries. IEEE, 278--287.Google ScholarCross Ref
Daniel S. Le, George R. Thoma, and Harry Wechsler. 1994. Automated page orientation and skew angle detection for binary document images. Pattern Recognition 27, 10 (1994), 1325--1344.Google ScholarCross Ref
Shutao Li, Qinghua Shen, and Jun Sun. 2007. Skew detection using wavelet decomposition and projection profile analysis. Pattern Recognition Letters 28, 5 (2007), 555--562.Google ScholarDigital Library
L. Likforman-Sulem, A. Hanimyan, and C. Faure. 1995. A Hough based algorithm for extracting text lines in handwritten documents. In The 3rd International Conference on Document Analysis and Recognition. IEEE Comput. Soc. Press, 774--777.Google Scholar
Laurence Likforman-Sulem, Abderrazak Zahour, and Bruno Taconet. 2007. Text line segmentation of historical documents: A survey. International Journal of Document Analysis and Recognition (IJDAR) 9, 2--4 (Sept. 2007), 123--138. http://link.springer.com/10.1007/s10032-006-0023-zGoogle ScholarCross Ref
N. Liolios, N. Fakotakis, and G. Kokkinakis. 2001. Improved document skew detection based on text line connected-component clustering. In The International Conference on Image Processing (Cat. No.01CH37205), Vol. 1. IEEE, 1098--1101.Google Scholar
G. Louloudis, B. Gatos, and C. Halatsis. 2007. Text line detection in unconstrained handwritten documents using a block-based Hough transform approach. In The 9th International Conference on Document Analysis and Recognition. IEEE, 599--603.Google Scholar
G. Louloudis, B. Gatos, I. Pratikakis, and C. Halatsis. 2009. Text line and word segmentation of handwritten documents. Pattern Recognition 42, 12 (2009), 3169--3183.Google ScholarDigital Library
Scott Lowther, Vinod Chandran, and Subramanian Sridharan. 2002. An accurate method for skew determination in document images. In Digital Image Computing Techniques and Applications, Vol. 1. 25--29.Google Scholar
Yue Lu and Chew Lim Tan. 2003. A nearest-neighbor chain based approach to skew estimation in document images. Pattern Recognition Letters 24, 14 (2003), 2315--2323.Google ScholarDigital Library
Yue Lu, Zhe Wang, and Chew Lim Tan. 2004. Word grouping in document images based on Voronoi tessellation. In International Workshop on Document Analysis Systems. Springer Berlin, 147--157.Google ScholarCross Ref
Simon M. Lucas. 2005. ICDAR 2005 text locating competition results. In The 8th International Conference on Document Analysis and Recognition. IEEE, 80--84.Google Scholar
Song Mao, Azriel Rosenfeld, and Tapas Kanungo. 2003. Document structure analysis algorithms: A literature survey. SPIE 5010, Document Recognition and Retrieval X 5010, 1 (2003), 197.Google Scholar
Simone Marinai, Marco Gori, and Giovanni Soda. 2005. Artificial neural networks for document analysis and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 1 (2005), 23--35.Google ScholarDigital Library
Gale L. Martin. 1993. Centered-object integrated segmentation and recognition of overlapping handprinted characters. Neural Computation 5, 3 (1993), 419--429.Google ScholarDigital Library
Maroua Mehri, Petra Gomez-Krämer, Pierre Héroux, Alain Boucher, and Rémy Mullot. 2013. Texture feature evaluation for segmentation of historical document images. In The 2nd International Workshop on Historical Document Imaging and Processing. ACM Press, New York, 102.Google ScholarDigital Library
Maroua Mehri, Pierre Héroux, Petra Gomez-Krämer, and Rémy Mullot. 2017. Texture feature benchmarking and evaluation for historical document image analysis. International Journal on Document Analysis and Recognition (IJDAR) 20, 1 (2017), 1--35.Google ScholarDigital Library
Maroua Mehri, Nibal Nayef, Pierre Héroux, Petra Gomez-Krämer, and Rémy Mullot. 2015. Learning texture features for enhancement and segmentation of historical document images. In The 3rd International Workshop on Historical Document Imaging and Processing. ACM Press, New York, 47--54.Google ScholarDigital Library
G. Nagy. 2000. Twenty years of document image analysis in PAMI. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 1 (2000), 38--62.Google ScholarDigital Library
George Nagy and Sharad Seth. 1984. Hierarchical representation of optically scanned documents. In The International Conference on Pattern Recognition. IEEE, 347--349.Google Scholar
Y. Nakano, Y. Shima, H. Fujisawa, J. Higashino, and M. Fujinawa. 1990. An algorithm for the skew normalization of document image. In The 10th International Conference on Pattern Recognition, Vol. 2. IEEE Comput. Soc. Press, 8--13.Google Scholar
N. Nandini, K. Srikanta Murthy, and G. Hemantha Kumar. 2008. Estimation of skew angle in binary document images using hough transform. World Academy of Science, Engineering and Technology 18 (2008), 44--49.Google Scholar
Wayne; Niblack. 1986. An Introduction to Digital Image Processing. Prentice-Hall, Englewood Cliffs NJ. 115--116 pages.Google Scholar
Nikos Nikolaou, Michael Makridis, Basilis Gatos, Nikolaos Stamatopoulos, and Nikos Papamarkos. 2010. Segmentation of historical machine-printed documents using adaptive run length smoothing and skeleton segmentation paths. Image and Vision Computing 28, 4 (Apr. 2010), 590--604.Google ScholarDigital Library
Konstantinos Ntirogiannis, Basilis Gatos, and Ioannis Pratikakis. 2014. ICFHR2014 competition on handwritten document image binarization (H-DIBCO 2014). In The 14th International Conference on Frontiers in Handwriting Recognition. IEEE, 809--813.Google Scholar
L. O’Gorman. 1993. The document spectrum for page layout analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 15, 11 (1993), 1162--1173.Google ScholarDigital Library
Oleg Okun, Matti Pietikäinen, O. Okun, and M. Pietikäinen. 1999. A survey of texture-based methods for document layout analysis. In Workshop on Texture Analysis in Machine Vision. 137--148.Google ScholarCross Ref
Sofia Ares Oliveira, Benoit Seguin, and Frederic Kaplan. 2018. dhSegment: A generic deep-learning approach for document segmentation. In The 16th International Conference on Frontiers in Handwriting Recognition. IEEE, 7--12.Google ScholarCross Ref
Nobuyuki Otsu. 1979. A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics 9, 1 (1979), 62--66.Google ScholarCross Ref
U. Pal and B. B. Chaudhuri. 1996. An improved document skew angle estimation technique. Pattern Recognition Letters 17, 8 (1996), 899--904.Google ScholarDigital Library
G. S. Peake and T. N. Tan. 1997. A general algorithm for document skew angle estimation. In The International Conference on Image Processing. IEEE Comput. Soc., 230--233.Google Scholar
Ihsin T. Phillips and Atul K. Chhabra. 1999. Empirical performance evaluation of graphics recognition systems. IEEE Transactions on Pattern Analysis and Machine Intelligence 21, 9 (1999), 849--870.Google ScholarDigital Library
Ihsin T. Phillips, Jisheng Liang, Atul K. Chhabra, and Robert Haralick. 1997. A performance evaluation protocol for graphics recognition systems. In International Workshop on Graphics Recognition. Springer, 372--389.Google Scholar
Stefan Pletschacher and Apostolos Antonacopoulos. 2010. The PAGE (page analysis and ground-truth elements) format framework. In The 20th International Conference on Pattern Recognition. IEEE, 257--260.Google ScholarDigital Library
Wolfgang Postl. 1986. Detection of linear oblique structures and skew scan in digitized documents. In The 8th International Conference on Pattern Recognition. 687--689.Google Scholar
Ioannis Pratikakis, Konstantinos Zagoris, George Barlas, and Basilis Gatos. 2016. ICFHR2016 handwritten document image binarization contest (H-DIBCO 2016). In The 15th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE, 619--623.Google Scholar
Ioannis Pratikakis, Konstantinos Zagoris, George Barlas, and Basilis Gatos. 2017. ICDAR2017 competition on document image binarization (DIBCO 2017). In The 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Vol. 1. IEEE, 1395--1403.Google Scholar
Lorenzo Quirós. 2018. Multi-task handwritten document layout analysis. arXiv preprint arXiv:1806.08852 (2018).Google Scholar
Lorenzo Quirós, Llu´s Serrano, Vicente Bosch, Alejandro H. Toselli, Rosa Congost, Enric Saguer, and Enrique Vidal. 2018. HTR Dataset ICFHR 2018. https://zenodo.org/record/1322666#.XHOanOgzaUk.Google Scholar
Irina Rabaev, Ofer Biller, Jihad El-Sana, Klara Kedem, and Itshak Dinstein. 2013. Text line detection in corrupted and damaged historical manuscripts. In The 12th International Conference on Document Analysis and Recognition. IEEE, 812--816.Google ScholarDigital Library
J. Y. Ramel, S. Leriche, M. L. Demonet, and S. Busson. 2007. User-driven page layout analysis of historical printed books. International Journal of Document Analysis and Recognition (IJDAR) 9, 2--4 (Apr. 2007), 243--261.Google ScholarCross Ref
Marte A. Ramírez-Ortegón, Lilia L. Ramírez-Ramírez, Ines Ben Messaoud, Volker Märgner, Erik Cuevas, and Raúl Rojas. 2014. A model for the gray-intensity distribution of historical handwritten documents and its application for binarization. International Journal on Document Analysis and Recognition 17, 2 (2014), 139--160.Google ScholarDigital Library
Tony M. Rath and Rudrapatna Manmatha. 2007. Word spotting for historical documents. International Journal on Document Analysis and Recognition 9, 2 (2007), 139--152.Google ScholarDigital Library
Ahsen Raza, Imran Siddiqi, Ali Abidi, and Fahim Arif. 2012. An unconstrained benchmark urdu handwritten sentence database with automatic line segmentation. In International Conference on Frontiers in Handwriting Recognition. IEEE, 491--496.Google ScholarDigital Library
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-assisted Intervention. Springer, 234--241.Google ScholarCross Ref
Raid Saabni, Abedelkadir Asi, and Jihad El-Sana. 2014. Text line extraction for historical document images. Pattern Recognition Letters 35, 1 (2014), 23--33.Google ScholarDigital Library
Raid Saabni and Jihad El-Sana. 2011. Language-independent text lines extraction using seam carving. In The International Conference on Document Analysis and Recognition. IEEE, 563--568.Google ScholarDigital Library
Rana S. M. Saad, Randa I. Elanwar, N. S. Abdel Kader, Samia Mashali, and Margrit Betke. 2016. BCE-Arabic-v1 dataset: Towards interpreting arabic document images for people with visual impairments. In The 9th ACM International Conference on PErvasive Technologies Related to Assistive Environments - PETRA. ACM Press, New York, New York, USA, 1--8.Google ScholarDigital Library
T. Saitoh, M. Tachikawa, and T. Yamaai. 1993. Document image segmentation and text area ordering. In The 2nd International Conference on Document Analysis and Recognition. IEEE Comput. Soc. Press, 323--329.Google Scholar
P. Saragiotis and N. Papamarkos. 2008. Local skew correction in documents. International Journal of Pattern Recognition and Artificial Intelligence 22, 4 (2008), 691--710.Google ScholarCross Ref
M. Sarfraz, S. A. Mahmoud, and Z. Rasheed. 2007. On skew estimation and correction of text. In Computer Graphics, Imaging and Visualisation. IEEE, 308--313.Google Scholar
Eric Saund, Jing Lin, and Prateek Sarkar. 2009. PixLabeler: User interface for pixel-level labeling of elements in document images. In The 10th International Conference on Document Analysis and Recognition. IEEE, 646--650.Google ScholarDigital Library
J. Sauvola and M. Pietikainen. 1995. Skew angle detection using texture direction analysis. In The 9th Scandinvian Conference on Image Analysis. 1099--1106.Google Scholar
J. Sauvola and M. Pietikäinen. 2000. Adaptive document image binarization. Pattern Recognition 33, 2 (2000), 225--236.Google ScholarCross Ref
Seong-Whan Seong-Whan Lee and Dae-Seok Dae-Seok Ryu. 2001. Parameter-free geometric document layout analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 11 (2001), 1240--1256.Google ScholarDigital Library
Mathias Seuret, Michele Alberti, Marcus Liwicki, and Rolf Ingold. 2017. PCA-initialized deep neural networks applied to document image analysis. In The 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). IEEE, 877--882.Google ScholarCross Ref
F. Shafait and T. M. Breuel. 2011. The effect of border noise on the performance of projection-based page segmentation methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 4 (2011), 846--851.Google ScholarDigital Library
F. Shafait, D. Keysers, and T. M. Breuel. 2006. Pixel-accurate representation and evaluation of page segmentation in document images. In The 18th International Conference on Pattern Recognition. IEEE, 872--875. DOI:https://doi.org/10.1109/ICPR.2006.934Google Scholar
Faisal Shafait, Joost van Beusekom, Daniel Keysers, and Thomas M. Breuel. 2008. Background variability modeling for statistical layout analysis. In The 19th International Conference on Pattern Recognition. IEEE, 1--4.Google Scholar
Mahnaz Shafii and Maher Sid-Ahmed. 2015. Skew detection and correction based on an axes-parallel bounding box. International Journal on Document Analysis and Recognition (IJDAR) 18, 1 (2015), 59--71.Google ScholarDigital Library
Asif Shahab. 2013. UW3 and UNLV Datasets. Http://www.iapr-tc11.org/mediawiki/index.php/Table_Ground_Truth_for_the_UW3_and_UNLV_datasets.Google Scholar
Zhixin Shi and Venu Govindaraju. 2004. Line separation for complex document images using fuzzy runlength. In The 1st International Workshop on Document Image Analysis for Libraries. 306--312.Google Scholar
Zhixin Shi, Srirangaraj Setlur, and Venu Govindaraju. 2009. A steerable directional local profile technique for extraction of handwritten Arabic text lines. In The 10th International Conference on Document Analysis and Recognition. IEEE, 176--180.Google ScholarDigital Library
Frank Y. Shih and Shy-Shyan Chen. 1996. Adaptive document block segmentation and classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 26, 5 (1996), 797--802.Google ScholarDigital Library
P. Shivakumara, G. Hemantha Kumar, D. S. Guru, and P. Nagabhushan. 2005. A novel technique for estimation of skew in binary text document images based on linear regression analysis. Sadhana 30, 1 (2005), 69--85.Google ScholarCross Ref
Fotini Simistira, Manuel Bouillon, Mathias Seuret, Marcel Wursch, Michele Alberti, Rolf Ingold, and Marcus Liwicki. 2017. ICDAR2017 competition on layout analysis for challenging medieval manuscripts. In The 14th IAPR International Conference on Document Analysis and Recognition. IEEE, 1361--1370.Google Scholar
A. Simon, J.-C. Pret, and A. P. Johnson. 1997. A fast algorithm for bottom-up document layout analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 3 (1997), 273--277.Google ScholarDigital Library
Brij Mohan Singh, Rahul Sharma, Debashis Ghosh, and Ankush Mittal. 2014. Adaptive binarization of severely degraded and non-uniformly illuminated documents. International Journal on Document Analysis and Recognition (IJDAR) 17, 4 (2014), 393--412.Google ScholarDigital Library
Chandan Singh, Nitin Bhatia, and Amandeep Kaur. 2008. Hough transform based fast skew detection and accurate skew correction methods. Pattern Recognition 41, 12 (2008), 3528--3546.Google ScholarDigital Library
Bolan Su, Shijian Lu, and Chew Lim Tan. 2010. Binarization of historical document images using the local maximum and minimum. In The 8th IAPR International Workshop on Document Analysis Systems. ACM Press, New York, 159--166.Google ScholarDigital Library
Wassim Swaileh, Kamel Ait Mohand, and Thierry Paquet. 2015. Multi-script iterative steerable directional filtering for handwritten text line extraction. In The 13th International Conference on Document Analysis and Recognition. IEEE, 1241--1245.Google ScholarDigital Library
D. Sylwester and S. Seth. 1995. A trainable, single-pass algorithm for column segmentation. In The 3rd International Conference on Document Analysis and Recognition, Vol. 2. IEEE Comput. Soc. Press, 615--618.Google Scholar
Breuel Thomas and Faisal Shafait. 2010. AutoMLP: Simple, effective, fully automated learning rate and size adjustment. In The Learning Workshop, Utah.Google Scholar
Tuan Anh Tran, In-Seop Na, and Soo-Hyung Kim. 2015. Hybrid page segmentation using multilevel homogeneity structure. In The 9th International Conference on Ubiquitous Information Management and Communication. ACM Press, New York, 1--6.Google ScholarDigital Library
Tuan Anh Tran, In Seop Na, and Soo Hyung Kim. 2016. Page segmentation using minimum homogeneity algorithm and adaptive mathematical morphology. International Journal on Document Analysis and Recognition (IJDAR) 19, 3 (Sep. 2016), 191--209.Google ScholarDigital Library
Nikos Vasilopoulos and Ergina Kavallieratou. 2017. Complex layout analysis based on contour classification and morphological operations. Engineering Applications of Artificial Intelligence 65 (2017), 220--229.Google ScholarDigital Library
Friedrich M. Wahl, Kwan Y. Wong, and Richard G. Casey. 1982. Block segmentation and text extraction in mixed text/image documents. Computer Graphics and Image Processing 20, 4 (Dec. 1982), 375--390.Google ScholarCross Ref
Hao Wei, Micheal Baechler, Fouad Slimane, and Rolf Ingold. 2013. Evaluation of SVM, MLP and GMM classifiers for layout analysis of historical documents. In The 12th International Conference on Document Analysis and Recognition. IEEE, 1220--1224.Google ScholarDigital Library
Hao Wei, Kai Chen, Rolf Ingold, and Marcus Liwicki. 2014. Hybrid feature selection for historical document layout analysis. In The 14th International Conference on Frontiers in Handwriting Recognition. 87--92.Google ScholarCross Ref
Hao Wei, Kai Chen, Anguelos Nicolaou, Marcus Liwicki, and Rolf Ingold. 2014. Investigation of feature selection for historical document layout analysis. In The 4th International Conference on Image Processing Theory, Tools and Applications. 1--6.Google ScholarCross Ref
Florian Westphal, Niklas Lavesson, and Håkan Grahn. 2018. Document image binarization using recurrent neural networks. In The 13th IAPR International Workshop on Document Analysis Systems (DAS). IEEE, 263--268.Google ScholarCross Ref
Christoph Wick and Frank Puppe. 2018. Fully convolutional neural networks for page segmentation of historical document images. In The 13th IAPR International Workshop on Document Analysis Systems (DAS). IEEE, 287--292.Google ScholarCross Ref
Chung-Chih Wu, Chien-Hsing Chou, and Fu Chang. 2008. A machine-learning approach for analyzing document layout structures with two reading orders. Pattern Recognition 41, 10 (2008), 3200--3213.Google ScholarDigital Library
Yi Xiao and Hong Yan. 2003. Text region extraction in a document image based on the Delaunay tessellation. Pattern Recognition 36, 3 (2003), 799--809.Google ScholarCross Ref
Yi Xiao and Hong Yan. 2004. Location of title and author regions in document images based on the Delaunay triangulation. Image and Vision Computing 22, 4 (2004), 319--329.Google ScholarCross Ref
H. Yan. 1993. Skew correction of document images using interline cross-correlation. Graphical Models and Image Processing 55, 6 (1993), 538--543.Google ScholarDigital Library
Younki Min, Sung-Bae Cho, and Yillbyung Lee. 1996. A data reduction method for efficient document skew estimation based on Hough transformation. In The 13th International Conference on Pattern Recognition, Vol. 3. IEEE, 732--736.Google Scholar
Bin Yu and Anil K. Jain. 1996. A robust and fast skew detection algorithm for generic documents. Pattern Recognition 29, 10 (Oct. 1996), 1599--1629.Google ScholarCross Ref
Yue Lu and C. L. Tan. 2005. Constructing area Voronoi diagram in document images. In The 8th International Conference on Document Analysis and Recognition, Vol. 1. IEEE, 342--346.Google Scholar
A. Zahour, B. Taconet, P. Mercy, and S. Ramdane. 2001. Arabic hand-written text-line extraction. In The 6th International Conference on Document Analysis and Recognition. IEEE Comput. Soc., 281--285.Google Scholar
Yefeng Zheng and David Doermann. 2010. LAMP Dataset of Layer Separation. https://lampsrv02.umiacs.umd.edu/projdb/project.php?id=61.Google Scholar

Index Terms

Document Layout Analysis: A Comprehensive Survey
1. Applied computing
  1. Document management and text processing
    1. Document capture
      1. Document analysis
2. Information systems
  1. Information retrieval

Recommendations

A Deep Learning-Based System for Document Layout Analysis
ICMLSC '22: Proceedings of the 2022 6th International Conference on Machine Learning and Soft Computing

Document image understanding is an essential process in the digital transformation era. Those systems automatically convert a paper document to a digital document for storing and information extracting. In practice, document layout analysis is a ...
Read More
Arabic document layout analysis

Document layout analysis is a key step in the process of converting document images into text. Arabic language script is cursive and written in different styles which cause some challenges in the analysis of Arabic text documents. In this paper, we ...
Read More
Document analysis applied to fragments: feature set for the reconstruction of torn documents
DAS '10: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems

Document analysis is done to analyze entire forms (e.g. intelligent form analysis, table detection) or to describe the layout/structure of a document. In this paper document analysis is applied to snippets of torn documents to calculate features that ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Computing Surveys Volume 52, Issue 6
November 2020
806 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3368196
Editor:
Sartaj Sahni
Department of Computer and Information Science and Engineering
Issue’s Table of Contents
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 October 2019
- Revised: 1 July 2019
- Accepted: 1 July 2019
- Received: 1 July 2018
Published in csur Volume 52, Issue 6

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Document segmentation
document image retrieval
document image understanding
document structure analysis
layout analysis
physical document structure
Qualifiers
- survey
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 97
  Total Citations
  View Citations
- 3,129
  Total Downloads
- Downloads (Last 12 months)541
- Downloads (Last 6 weeks)84
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Document Layout Analysis: A Comprehensive Survey

ACM Computing Surveys

Abstract

References

Cited By

Index Terms

Recommendations

A Deep Learning-Based System for Document Layout Analysis

Arabic document layout analysis

Document analysis applied to fragments: feature set for the reconstruction of torn documents