Presentation + Paper
1 April 2020 Noise characterization for historical documents with physical distortions
Author Affiliations +
Abstract
Physical distortions (such as thorn-offs and scratches) are commonly seen in historical documents. Their presence disturbs downstream processes such as optical character recognition (OCR) and layout analysis, which leads to reduced productivity in automatic document information retrieval. A proper characterization of such physical noise is an important step in the development of historical document denoising methods. In this paper, we tackle noise characterization with Bayesian labeling, where noise and text pixels are characterized in terms of likelihood densities. We employ in particular two different significance measures, which are formulated using pointwise and cone-of-influence (COI) approximation of local Lipschitz regularity in the wavelet domain. We evaluate the effectiveness of the proposed noise characterization using a binary noise versus text classification model, where we show that a naive binary classifier using average point ratio (APR) or average cone ratio (ACR) distribution densities leads to effective classification of noise and text pixels with encouraging overall success rates. This encourages future work on the development of Bayesian frameworks for the recognition of physical distortions in historical documents.
Conference Presentation
© (2020) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Tan Lu, Dejan Ilic, and Ann Dooms "Noise characterization for historical documents with physical distortions", Proc. SPIE 11353, Optics, Photonics and Digital Technologies for Imaging Applications VI, 113530F (1 April 2020); https://doi.org/10.1117/12.2559694
Lens.org Logo
CITATIONS
Cited by 1 scholarly publication.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Wavelets

Denoising

Image denoising

Image processing

Optical character recognition

Performance modeling

Wave propagation

Back to Top