Persian/arabic handwritten word recognition using M-band packet wavelet transform

doi:10.1016/j.imavis.2007.09.004

Image and Vision Computing

Volume 26, Issue 6, 2 June 2008, Pages 829-842

https://doi.org/10.1016/j.imavis.2007.09.004 Get rights and content

Abstract

The extraction of rotation and scale invariant features is an essential problem in document image analysis. This paper proposes an effective rotation and scale invariant holistic handwritten word recognition scheme. This approach utilizes M-band packet wavelet transform to extract feature vector of Farsi word image. The global and local features extracted are exploited in recognition of limited-size lexicon of handwritten words. The rotation and scale invariant feature of a word image involves applying a polar transform to eliminate rotation and scale effects, but this produces M-row shifted polar image, which is passed to a row shift invariant M-band wavelet packet transform to eliminate the row shift effects. The output wavelet coefficients are rotation and scale invariant. For each subband of these wavelet coefficients a set of local energy features are computed and we extract feature vectors from the subbands of wavelet coefficients. The proposed polar M-band wavelet features have been tested by employing Mahalanobis algorithm to classify a set of distinct natural handwriting Farsi words. We compared the proposed scheme with two well-known rotation invariant methods; Fourier-wavelet and Zernike moments. The experimental results show that the proposed algorithm improves the recognition rate about 12 percents.

Introduction

With the progress in information technology and the increased call for information, the amount of documents containing information has increased more and more. Despite the use of electronic documents, the amount of printed or handwritten documents has never decreased. However, it is very complicated and cumbersome to store and retrieve the ever increasing documents. On the other hand, electronic documents have several advantages in storage, retrieval, search and updating. Document image analysis covers the algorithms on transforming documents into electronic format.

Document image analysis has been the topic of research for almost three decades. A large number of research papers and reports have already been published on Latin, Chinese and Japanese characters. However, little work, mostly because of the cursive nature of Farsi writing rules, has been performed on automatic recognition of Farsi Documents. Therefore, this area of research is still an open field.

Document image analysis involves specifying the geometry of the maximal homogeneous regions and classifying them into text, image, table, drawing, etc., and then images are compressed, tables are reconstructed, and drawings are vectorized. Text regions are segmented into isolated words and recognized. This paper focuses on the word recognition part of document image analysis.

There are two approaches for word recognition; analytical and holistic. The first one treats a word as a collection of simpler subunits such as characters and proceeds by segmenting the word into these units, identifying the units and building a word-level interpretation using the lexicon. The second one treats the word as a single, indivisible entity and attempts to recognize it using features of the word as whole [1], [2], [3], [4]. This approach is referred to as the word-based or holistic and is inspired in part by psychological studies of human reading, which indicate that humans use features of word shape such as length, ascenders, and descanters in reading (Fig. 1). Holistic strategies employ top-down approaches for recognizing the whole word. This eliminates the segmentation problem.

This paper presents a novel holistic scheme for handwritten Farsi word recognition (HFWR) via packet wavelet transform. Some applications of HFWR are the recognition of handwritten check amounts, postal address, forms, license plate, and automatic filing of faxes. The handwritten text must be located, extracted, made free of artifacts, separated into lines if necessary, and, finally, into individual words before recognition. These steps are generally nontrivial. This paper assumes that the complex task of segmenting handwritten word or phrase of interest from its surroundings has already been accomplished by prior processes. The tasks of segmentation and recognition are generally accomplished sequentially based on packet wavelet features of the word image. This paper focuses on the recognition of the isolated word or phrase using the appropriate lexicon. Fig. 2 depicts an example where a word has been passed to the recognition system and based on the lexicon; the possible candidates for the input are the outputs of the system.

This paper proposes an effective holistic scheme for rotation and scale invariant handwritten word recognition using M-band packet wavelet transform. This approach utilizes M-band packet wavelet transform to extract feature vector for Farsi word image. The global and local features of the word image are exploited for recognizing the limited-size lexicon of handwritten Farsi words. As the size of the lexicon gets larger, the complexity of algorithms increases linearly due to the need for a larger search space and a more complex pattern representation. Additionally, the recognition rates decrease rapidly due to the decrease in between class variances in the feature space.

The extraction of rotation and scale invariant features is always an important problem in content-based image analysis. Proposals can be found in the literature that specifically addresses the problem of rotation and scale invariant image analysis [5], [6], [7], [8], [9], [10], [11], [12], [13], [14]. Chen and Bui [15] proposed an invariant Fourier-wavelet descriptor for Chinese character recognition. But, their performance may drop significantly for databases of more general images. Several authors [16], [17], [18], [19] extract 1D invariant features from 2D patterns. The advantage of this approach is that we can save space for the database and reduce the time for matching through the whole database. The drawback is the low recognition rate because; less information from the original pattern is retained.

The use of moment invariant features for identification and inspection of 2D shape has received much attention [20]. Various types of image moments, including geometrical, Legendre, Zernike, pseudo-Zernike, Fourier–Mellin, and complex, have been evaluated in terms of noise sensitivity, information redundancy, and capability of image description [21]. The result is the superiority of Zernike moments. Khotanzad and Hong [22] have also compared Hu’s moment invariants and the magnitudes of Zernike moments in rotational invariant recognition of characters and shapes. Their results clearly show that Zernike moments are superior to Hu’s moments. Invariant features of Zernike moments are sensitive to noise and when the number of features reaches a certain limit, classification accuracy may not increase with additional features. The accuracy of the moment invariant features may drop significantly, when the number of image classes increases.

This paper introduces an effective rotation and scale invariant handwritten word recognition based on M-band packet wavelet transform. The rotation and scale invariant feature of a word image involves applying a polar transform [23], [24] to eliminate the rotation and scaling effects, but at the same time produces M-row shifted polar image, which is then passed to a row shift invariant M-band wavelet packet transform to eliminate the row shift effects. So, the output wavelet coefficients are rotation and scale invariant. Then, for each subband of these wavelet coefficients a set of local energy features are computed. The local energy features were only computed for the coefficients of the last wavelet subband. We extracted feature vectors from subbands of wavelet coefficients for M = 2, 3 and 4, then we have three independent feature vectors for handwritten word classification and compare classification result for M = 2, 3 and 4. The proposed polar M-band wavelet feature has been well tested using a Mahalanobis classifier to classify a set of distinct natural handwriting Farsi words provided from different person with different handwriting style.

The outline of this paper is as follows: Next section briefly introduces and reviews the standard 2D M-band wavelet packets decomposition and filter bank design techniques. Section 3 introduces our proposed scheme for extracting the rotation and scale invariant polar M-band wavelet energy from any given image. Section 4 presents the classification results for different kinds of energy measures and different bands of energy features on handwritten testing data sets. The comparison of the classification performance and noise robustness of our proposed method in comparison to other rotation invariant handwritten word recognition methods is also brought out in Section 4. The final section draws the conclusions.

Section snippets

M-band packet wavelet transforms

The wavelet transform maps a function f(x) ∈ L²(R) onto a scale–space plane. The wavelets are obtained from a single prototype function ψ(x) by scaling a and shifts b. The continuous wavelet transform of a function is given as: ${Wf}_{a} (b) = \int f (x) ψ_{a, b}^{*} (x) d x$ Both the practical and mathematical interpretation of wavelet seem to best served by using the concept of resolution to define the effects of changing scale. To do this, we will start with a scaling function φ(t) rather than directly with the wavelet ψ(

Rotation and scale invariant M-band packet wavelet transform

Discrete M-band wavelet transform (DMWT) has been shown to be useful for image analysis [28], [26], [33], because wavelets have finite duration which provides both the frequency and spatial locality and efficient implementation. The hierarchical one-dimensional discrete M-band wavelet transform uses a set of M filters derived from wavelet functions to decompose the original signal into M subbands: details and approximation. The decomposition process is recursively applied to the approximation

Local feature extraction and classification

For HFWR, we need features for classification; these features are extracted from subband wavelet coefficients. Fig. 11 presents the stages of feature extraction. As shown in Fig. 11, the first stage, with M-band shift invariant packet wavelet, decomposes the word image into a number of subband images, the second stage extracts the local or global features from the subband images with one of the energy measurement defined in Table 2, the third stage stores the feature vectors in descending order

Experimental results

The effectiveness of the proposed HFWR has been well tested via several experiments using a set of handwritten word images, from 12 different people. We carried out two major experiments with the objectives: (1) investigating the HFWR performance on different energy measure, (2) comparing the proposed method with two other pattern recognition schemes. We prepared 100 different handwritten word images from 12 different people, and, generated from each one seven word images with different

Ali Broumandnia was born in Esfahan, Iran. He received the B.Sc. degree in Computer Hardware Engineering from the Esfahan University of Technology in Esfahan, Iran, in 1992, the M.Sc. degree in Computer Hardware Engineering from the Iran University of Science and Technology in Tehran, Iran, in 1995, and Ph.D. degree in Computer Engineering from the Islamic Azad University-Science and Research Branch in Tehran, Iran, in 2006.

From 1993 through 1995 he worked on intelligent transportation control

References (43)

Phillip E. Mitchell et al.
Newspaper layout analysis incorporating connected component separation
Image and Vision Computing
(2004)
J. You et al.
Classification and segmentation of rotated and scaled textured images using texture ‘tuned’ masks
Pattern Recognition
(1993)
G. Chen et al.
Invariant Fourier-wavelet descriptor for pattern recognition
Pattern Recognition
(1999)
P. Wunsch et al.
Wavelet descriptors for multiresolution recognition of handprinted characters
Pattern Recognition
(1995)
S.-S. Wang et al.
Invariant pattern recognition by moment Fourier descriptor
Pattern Recognition
(1994)
R.J. Prokop et al.
A survey of moment-based techniques for unconcluded object representation and recognition
CVGIP: Graphical Models Image Process
(1992)
A. Khotanzad et al.
Rotation invariant image recognition using features selected via a systematic method
Pattern Recognition
(1990)
I. Cohen et al.
Orthonormal shift-invariant wavelet packet decomposition and representation
IEEE Transaction on Signal Processing
(1997)
Chao Kan et al.
Invariant character recognition with Zernike and orthogonal Fourier–Mellin moments
Pattern Recognition
(2002)
M.A. O’Hair, M. Kabrisky, Recognizing whole words as single symbols, in: Proceeding First International Conference...

S.J. Soltysiak, Visual information in word recognition: word shape or letter identities? in: Proceeding Workshop...

Tal Steinherz et al.

Offline cursive script word recognition – a survey

International Journal on Document Analysis and Recognition

(1999)

M. Leung, A.M. Peterson, Scale and Rotation Invariant Texture Classification, in: Proceeding International Conference...

F.S. Cohen et al.

Classification of rotated and scaled textured images using gaussian markov random field models

IEEE Transaction on Pattern Analysis and Machine Intelligence

(1991)

J.-L. Chen et al.

Rotation and gray scale transform invariant texture identification using wavelet decomposition and hidden markov model

IEEE Transaction on Pattern Analysis and Machine Intelligence

(1994)

W.R. Wu et al.

Rotation and gray-scale transform-invariant texture classification using spiral resampling, subband decomposition, and hidden Markov model

IEEE Transaction on Image Processing

(1996)

Chi-Man Pun et al.

Extraction of shift invariant wavelet features for classification of image with different sizes

IEEE Transaction on Pattern Analysis and Machine Intelligence

(2004)

H. Xiong et al.

A translation- and scale-invariant adaptive wavelet transform

IEEE Transaction on Image Processing

(2000)

A. Khotanzad et al.

Classification of invariant image representations using a neural network

IEEE Transaction on Acoustics, Speech, and Signal Processing

(1990)

S.J. Perantonis et al.

Translation, rotation, and scale invariant pattern recognition by high-order neural networks and moment classifiers

IEEE Transaction on Neural Network

(1992)

O. Rashkovskiy et al.

Scale, rotation, and shift invariant wavelet transform

Proceeding SPIE

(1994)

Cited by (40)

Correlation and image moment approaches to analyze the Glagolitic script carved in stone tablets
2013, Optik
Citation Excerpt :
Among other types of image moments, Zernike moments are commonly used, since they are rotationally invariant by definition, while translation and scale invariance are relatively easily achieved by normalization. With these properties Zernike moments were successfully applied in recognition of alphabet [20], Farsi [21], Chinese [22] and Arabic [23] characters. Although more complicated than Zernike moments, the Fourier–Jacobi (α = β = 4) moments are particularly interesting, since they make set of image features robust to object deformations and noise [24].
In relatively recent paleographic studies, a new type of the Glagolitic script, namely triangular type, was introduced as being older than the rounded type (commonly considered as the oldest known type) with possible mixing of letter types in the early period of Glagolitic writing. That hypothesis, drawn from the so-called generative model, has been lacking of a quantitative analysis. Here, we report on such an analysis in which the forms of the original letters carved in two among oldest stone tablets are compared to the models of the triangular and rounded letters. To calculate similarity measures, two different approaches were used. In the first approach, we use the standard correlator systems that are fast but sensitive to various letter deformations. The second approach, via calculating Fourier–Jacobi image moments, is rather slow but possesses invariance to input image deformations. To accelerate the entire recognition and classification process of the later approach, we introduced so-called sensitivity vector consisting of those moments that are robust to rotational and scale change deformations of input image. Although different in details, both approaches indicate presence of a mixture of both letter types supporting thus one of the hypothesis claims.
A new approach to detect and extract characters from off-line printed images and text
2013, Procedia Computer Science
Characters extraction is the most critical pre-processing step for any off-line text recognition system because the characters are the smallest unit of any language script. The paper proposes an approach to segment character images from the text containing images and computer printed or handwritten words. This segmentation app roach is based on a set of properties for each connected component (object) in the whole binary image of the machine printed or handwritten text containing some other images. These words which are printed along with some images are of different lengths and are printed by different cursive fonts of different sizes. This character extraction technique is applied for the segmentation of untouched characters from the machine printed or handwritten words of varying length written on a noisy background having some images etc. Very promising results are achieved which reveals the robustness of the proposed character detection and extraction technique.
Binary segmentation algorithm for English cursive handwriting recognition
2012, Pattern Recognition
Citation Excerpt :
Therefore, segmentation based approach is more practical than the holistic approach for real world problems. Segmentation of off-line cursive words into characters is one of the most difficult processes in handwriting recognition and it is also defined as one of the most important processes because it directly affects the result of recognition process [1–9]. Researchers in the field have found that over-segmentation and validation (OSV) techniques produce promising results because heuristic over-segmenter tends to find all possible character boundaries and the excessive segmentation points created by the over-segmenter are removed from the final segmentation points through validation process.
Segmentation in off-line cursive handwriting recognition is a process for extracting individual characters from handwritten words. It is one of the most difficult processes in handwriting recognition because characters are very often connected, slanted and overlapped. Handwritten characters differ in size and shape as well. Hybrid segmentation techniques, especially over-segmentation and validation, are a mainstream to solve the segmentation problem in cursive off-line handwriting recognition. However, the core weakness of the segmentation techniques in the literature is that they impose high risks of chain failure during an ordered validation process. This paper presents a novel Binary Segmentation Algorithm (BSA) that reduces the risks of the chain failure problems during validation and improves the segmentation accuracy. The binary segmentation algorithm is a hybrid segmentation technique and it consists of over-segmentation and validation modules. The main difference between BSA and other techniques in the literature is that BSA adopts an un-ordered segmentation strategy. The proposed algorithm has been evaluated on CEDAR benchmark database and the results of the experiments are very promising.
Does color modalities affect handwriting recognition? An empirical study on Persian handwritings using convolutional neural networks
2023, arXiv
A novel similar character discrimination method for online handwritten Urdu character recognition in half forms
2022, Scientia Iranica
Recognition of Persian/Arabic Handwritten Words Using a Combination of Convolutional Neural Networks and Autoencoder (AECNN)
2022, Mathematical Problems in Engineering

View all citing articles on Scopus

From 1993 through 1995 he worked on intelligent transportation control with image processing and designed the Automatic License Plate Recognition for Tehran Control Traffic Company. Since 1996, he has been with the Department of Computer Engineering, Islamic Azad University-South Tehran Branch, where he is currently an Assistance Professor. Since 2003, he has been with Department of Electrical, Computer Engineering and Information Technology, Islamic Azad University-Qazvin Branch, where he is currently a lecturer. He has published over 30 computer books, journal and conference papers. He is interested in Persian/Arabic character recognition and segmentation, Persian/Arabic document segmentation, medical imaging, signal and image processing, and wavelet analysis. He is reviewer of some International journals and conferences.

View full text

Persian/arabic handwritten word recognition using M-band packet wavelet transform

Abstract

Introduction

Section snippets

M-band packet wavelet transforms

Rotation and scale invariant M-band packet wavelet transform

Local feature extraction and classification

Experimental results

Image and Vision Computing

Pattern Recognition

Pattern Recognition

Pattern Recognition

Pattern Recognition

CVGIP: Graphical Models Image Process

Pattern Recognition

IEEE Transaction on Signal Processing

Pattern Recognition

Offline cursive script word recognition – a survey

International Journal on Document Analysis and Recognition

Classification of rotated and scaled textured images using gaussian markov random field models

IEEE Transaction on Pattern Analysis and Machine Intelligence

Rotation and gray scale transform invariant texture identification using wavelet decomposition and hidden markov model

IEEE Transaction on Pattern Analysis and Machine Intelligence

Rotation and gray-scale transform-invariant texture classification using spiral resampling, subband decomposition, and hidden Markov model

IEEE Transaction on Image Processing

Extraction of shift invariant wavelet features for classification of image with different sizes

IEEE Transaction on Pattern Analysis and Machine Intelligence

A translation- and scale-invariant adaptive wavelet transform

IEEE Transaction on Image Processing

Classification of invariant image representations using a neural network

IEEE Transaction on Acoustics, Speech, and Signal Processing

Translation, rotation, and scale invariant pattern recognition by high-order neural networks and moment classifiers

IEEE Transaction on Neural Network

Scale, rotation, and shift invariant wavelet transform

Proceeding SPIE