Editorial for special issue on “advanced topics in document analysis and recognition”

Kise, Koichi; Zanibbi, Richard; Jain, Rajiv; Fink, Gernot A.

doi:10.1007/s10032-023-00448-5

Editorial for special issue on “advanced topics in document analysis and recognition”

Editorial
Published: 03 August 2023

Volume 26, pages 171–173, (2023)
Cite this article

Download PDF

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Editorial for special issue on “advanced topics in document analysis and recognition”

Download PDF

Koichi Kise¹,
Richard Zanibbi²,
Rajiv Jain³ &
…
Gernot A. Fink⁴

682 Accesses
Explore all metrics

The ongoing advancement of deep learning techniques, such as the Transformer and Large Language Models, continues to enhance both the accuracy and efficiency of methods in the field of Document Analysis and Recognition, while also broadening its scope. The primary objective of this Special Issue is to keep pace with these developments and showcase the latest advancements in document analysis and recognition. Upholding our tradition established for journal track papers from the ICDAR conference with issues released in 2019 and 2021, we present the third edition of the Special Issue, under the same title.

Despite the persistent effects of the COVID-19 pandemic, we initiated a call for papers, disseminating this widely through the web pages of IJDAR and ICDAR. By November 2022, we received 33 submissions that were deemed relevant to the scope of our work. Each submission was assigned to one of ourselves as guest editor, with careful attention paid to avoiding any conflicts of interest. We procured reviews from experts in the field, adhering to the journal's standard practices. After a rigorous review process, often involving two or three rounds, we accepted 13 papers for publication in this special issue. These papers reflect both the breadth and depth of current research in the field of Document Analysis and Recognition.

The accepted papers can be grouped into several categories: document image processing and classification (two papers), historical document analysis (four papers), character recognition (one paper), online handwriting analysis (two papers), layout analysis (two papers), and applications (two papers). We will briefly summarize the accepted papers in the order stated.

1 Document image processing and classification

This category includes two papers.

The first paper by Felix Hertlein, Alexander Naumann, and Patrick Philipp introduces a novel dataset, Inv3D, for the conventional task of unwarping document images. Inv3D comprises high-resolution images with complex 3D structures, distinguishing it from other existing unwarping datasets. The authors also put forth a unique image unwarping algorithm that utilizes structured templates to enhance performance.

The second paper by Saifullah Saifullah, Stefan Agne, Andreas Dengel, and Sheraz Ahmed tackles another standard task—document image classification, in the era of deep learning. They emphasize how deep learning models outperform non-deep counterparts when ample annotated data are available. However, they acknowledge that generating such annotated data can often be cost-prohibitive. Their solution is to employ active learning, which they successfully used to reduce annotation by 15–40%, while achieving comparable results to models trained on fully annotated data.

2 Historical document analysis

This category comprises four papers.

The first paper resides at the intersection of historical document analysis and document image processing. Boraq Madi, Reem Alasam, Jihad El-Sana, and Raeed Shammas tackle a unique problem related to a type of document known as a Palimpsest. These are manuscript pages from which the text has been erased to prepare for reuse with a different document. The task of reconstructing the erased text is challenging, owing not only to the overlying overwritten text but also to a variety of distortions impacting the erased text. In their paper, the authors leverage the capabilities of Generative Adversarial Networks (GANs) to achieve state-of-the-art performance in this complex task.

Florian Kordon, Nikolaus Weichselbaumer, Randall Herz, Stephen Mossman, Edward Potten, Mathias Seuret, Martin Mayr, and Vincent Christlein confront the challenge of identifying glyphs in historical documents. Given the vast variety of type designs and the range of printing and preservation conditions, this task poses a considerable challenge. Their proposed solution is a joint energy-based model capable of classifying individual glyphs and weeding out out-of-distribution (OOD) samples. Experimental results suggest that this method can deliver performance on par with purely discriminative methods, while significantly enhancing OOD detection rates.

The paper by Najoua Rahal, Lars Vögtlin, and Rolf Ingold offers a deep learning approach to analyze the layout of historical documents. The prevailing issue when applying deep learning lies in the lack of extensive labeled datasets. The authors confront this issue by employing two strategies for pre-training the network with controlled data: the first entails the actual labeling task using artificial data, while the second strategy pre-trains the networks with real data for a pretext task. Experimental outcomes from text line extraction and classification tasks indicate that the proposed method can decrease training time without sacrificing performance, and in some instances, even enhancing it.

Finally, Solène Tarride, Martin Maarand, Mélodie Boillet, James McGrath, Eugénie Capel, Hélène Vézina, and Christopher Kermorvant outline a comprehensive workflow for extracting information from handwritten parish registers. The authors' proposed workflow encapsulates various steps including page classification, text line detection, handwritten text recognition, named entity recognition, act detection and classification, concluding with a validation phase. To evaluate this workflow, they utilized two million pages of Quebec parish registers dating from the nineteenth and twentieth centuries.

3 Character recognition

Within this category, we feature a single paper penned by Esma F. Bilgin Tasdemir. The method proposed in the paper centers on recognizing printed Ottoman script through the combination of CNN and bidirectional LSTM. The author has trained the network utilizing synthetic data and data augmentation techniques. The method's efficacy was evaluated using a printed historical Ottoman book. The outcome of the text line image recognition task showed a character error rate (CER) of 11% for synthetic data and 16% for real data.

4 Online handwriting analysis

This category includes two papers.

The first paper in this category by Wassim Swaileh, Florent Imbert, Yann Soullard, Romain Tavenard, and Eric Anquetil delves into the process of reconstructing handwriting trajectories. The authors employ a method involving the use of signals from a digital pen sensor to recreate these trajectories. In their approach, they utilize dynamic time warping to address the issue of disparate sampling rates, while a temporal convolutional network aids the reconstruction process. A performance assessment based on a benchmark dataset reveals that their proposed methodology outperforms existing methods.

The second paper in focus by Ahmad Mustafid, Junaid Younas, Paul Lukowicz, and Sheraz Ahmed explores online handwriting classification. The authors enrich existing online handwriting datasets with an innovative labeling approach that operates at three distinct levels: stroke, word, and line. This enhanced dataset, christened IAMonSense, is made available for public use. The team also provides benchmark results for the dataset in relation to a handwriting classification task, which involves three classes: text, math, and graph. They leveraged a variety of network structures, including traditional CNN, graph neural networks, attention-based neural networks, and transformers, to illustrate performance variations at the three different levels.

5 Layout analysis

This category consists of two deep learning-based papers, one focusing on zone detection and the other on text line extraction.

The zone detection paper by Alexander Gayer, Daria Ershova, and Vladimir V. Arlazarov presents a novel method for detecting machine readable zones (MRZs). MRZ recognition plays a crucial role in extracting personal data from passports, IDs, and visas. The paper's emphasis is on reducing computational requirements to facilitate use on mobile devices. The proposed YOLO-MRZ method proves to be incredibly efficient, operating 83 times faster than its counterpart, Tiny YOLO.

The second paper by Adeela Islam, Tayaba Anjum, and Nazar Khan introduces an innovative approach to extracting text lines from handwritten documents. The authors utilize the Mask R-CNN framework, considering text lines as objects for this purpose. The model is trained in an end-to-end fashion, devoid of any dataset-specific pre- or post-processing. The method's efficacy was tested across nine varied datasets, including a diverse array of handwriting scripts, layouts, page backgrounds, line orientations, and spacings.

6 Applications

The first paper in this category is on music recognition by Antonio Ríos-Vila, David Rizo, José M. Iñesta, and Jorge Calvo-Zaragoza. The study addresses the ongoing challenge in the field of optical music recognition, specifically the recognition of pianoform musical scores, due to their structural complexity. In a comprehensive, end-to-end approach, the authors implemented three distinct models: encoder-only (FCN), RNN decoding (CRNN), and transformer decoding (CNNT). They created new datasets, named GrandStaff (printed and distorted), for evaluating these methods, representing another significant contribution to the field.

A second paper in the application category is written by Muhammad Nauman Ahmed Bhatti, Imran Siddiqi, and Momina Moetesum. In contrast to other papers in this special issue, this research applies natural language processing techniques to news story segmentation, specifically focusing on the Urdu language. The authors utilized an LSTM-based Siamese network model, which was trained on positive (belonging to the same story) and negative (belonging to different stories) pairs of sentences. The authors conducted their experiments using two separate datasets.

We are convinced that these contributions showcase innovative trajectories in the field of document analysis and recognition, encompassing a broad spectrum of foundational technologies and application domains. We express our profound gratitude to all the authors who generously shared their cutting-edge research and to all reviewers who provided their insightful critiques within a tight timeline. Additionally, we extend our sincere thanks to the IJDAR team members, namely Katherine Moretti and Priya Verma, and IJDAR's Editors-in-Chief, Daniel Lopresti and Simone Marinai, for their unwavering support and invaluable feedback.

Author information

Authors and Affiliations

Graduate School of Informatics, Osaka Metropolitan University, 1-1 Gakuencho, Naka, Sakai, Osaka, 599-8531, Japan
Koichi Kise
Department of Computer Science, Rochester Institute of Technology, 102 Lomb Memorial Drive, Rochester, NY, 14623-5608, USA
Richard Zanibbi
Adobe Research, 7878 Diamondback Dr., College Park, MD, 20742, USA
Rajiv Jain
Department of Computer Science, TU Dortmund University, Otto-Hahn-Strasse 16, 44225, Dortmund, Germany
Gernot A. Fink

Authors

Koichi Kise
View author publications
You can also search for this author in PubMed Google Scholar
Richard Zanibbi
View author publications
You can also search for this author in PubMed Google Scholar
Rajiv Jain
View author publications
You can also search for this author in PubMed Google Scholar
Gernot A. Fink
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Koichi Kise.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kise, K., Zanibbi, R., Jain, R. et al. Editorial for special issue on “advanced topics in document analysis and recognition”. IJDAR 26, 171–173 (2023). https://doi.org/10.1007/s10032-023-00448-5

Download citation

Accepted: 02 July 2023
Published: 03 August 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s10032-023-00448-5

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Editorial for special issue on “advanced topics in document analysis and recognition”

1 Document image processing and classification

2 Historical document analysis

3 Character recognition

4 Online handwriting analysis

5 Layout analysis

6 Applications

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation