Persian/arabic handwritten word recognition using M-band packet wavelet transform

https://doi.org/10.1016/j.imavis.2007.09.004Get rights and content

Abstract

The extraction of rotation and scale invariant features is an essential problem in document image analysis. This paper proposes an effective rotation and scale invariant holistic handwritten word recognition scheme. This approach utilizes M-band packet wavelet transform to extract feature vector of Farsi word image. The global and local features extracted are exploited in recognition of limited-size lexicon of handwritten words. The rotation and scale invariant feature of a word image involves applying a polar transform to eliminate rotation and scale effects, but this produces M-row shifted polar image, which is passed to a row shift invariant M-band wavelet packet transform to eliminate the row shift effects. The output wavelet coefficients are rotation and scale invariant. For each subband of these wavelet coefficients a set of local energy features are computed and we extract feature vectors from the subbands of wavelet coefficients. The proposed polar M-band wavelet features have been tested by employing Mahalanobis algorithm to classify a set of distinct natural handwriting Farsi words. We compared the proposed scheme with two well-known rotation invariant methods; Fourier-wavelet and Zernike moments. The experimental results show that the proposed algorithm improves the recognition rate about 12 percents.

Introduction

With the progress in information technology and the increased call for information, the amount of documents containing information has increased more and more. Despite the use of electronic documents, the amount of printed or handwritten documents has never decreased. However, it is very complicated and cumbersome to store and retrieve the ever increasing documents. On the other hand, electronic documents have several advantages in storage, retrieval, search and updating. Document image analysis covers the algorithms on transforming documents into electronic format.

Document image analysis has been the topic of research for almost three decades. A large number of research papers and reports have already been published on Latin, Chinese and Japanese characters. However, little work, mostly because of the cursive nature of Farsi writing rules, has been performed on automatic recognition of Farsi Documents. Therefore, this area of research is still an open field.

Document image analysis involves specifying the geometry of the maximal homogeneous regions and classifying them into text, image, table, drawing, etc., and then images are compressed, tables are reconstructed, and drawings are vectorized. Text regions are segmented into isolated words and recognized. This paper focuses on the word recognition part of document image analysis.

There are two approaches for word recognition; analytical and holistic. The first one treats a word as a collection of simpler subunits such as characters and proceeds by segmenting the word into these units, identifying the units and building a word-level interpretation using the lexicon. The second one treats the word as a single, indivisible entity and attempts to recognize it using features of the word as whole [1], [2], [3], [4]. This approach is referred to as the word-based or holistic and is inspired in part by psychological studies of human reading, which indicate that humans use features of word shape such as length, ascenders, and descanters in reading (Fig. 1). Holistic strategies employ top-down approaches for recognizing the whole word. This eliminates the segmentation problem.

This paper presents a novel holistic scheme for handwritten Farsi word recognition (HFWR) via packet wavelet transform. Some applications of HFWR are the recognition of handwritten check amounts, postal address, forms, license plate, and automatic filing of faxes. The handwritten text must be located, extracted, made free of artifacts, separated into lines if necessary, and, finally, into individual words before recognition. These steps are generally nontrivial. This paper assumes that the complex task of segmenting handwritten word or phrase of interest from its surroundings has already been accomplished by prior processes. The tasks of segmentation and recognition are generally accomplished sequentially based on packet wavelet features of the word image. This paper focuses on the recognition of the isolated word or phrase using the appropriate lexicon. Fig. 2 depicts an example where a word has been passed to the recognition system and based on the lexicon; the possible candidates for the input are the outputs of the system.

This paper proposes an effective holistic scheme for rotation and scale invariant handwritten word recognition using M-band packet wavelet transform. This approach utilizes M-band packet wavelet transform to extract feature vector for Farsi word image. The global and local features of the word image are exploited for recognizing the limited-size lexicon of handwritten Farsi words. As the size of the lexicon gets larger, the complexity of algorithms increases linearly due to the need for a larger search space and a more complex pattern representation. Additionally, the recognition rates decrease rapidly due to the decrease in between class variances in the feature space.

The extraction of rotation and scale invariant features is always an important problem in content-based image analysis. Proposals can be found in the literature that specifically addresses the problem of rotation and scale invariant image analysis [5], [6], [7], [8], [9], [10], [11], [12], [13], [14]. Chen and Bui [15] proposed an invariant Fourier-wavelet descriptor for Chinese character recognition. But, their performance may drop significantly for databases of more general images. Several authors [16], [17], [18], [19] extract 1D invariant features from 2D patterns. The advantage of this approach is that we can save space for the database and reduce the time for matching through the whole database. The drawback is the low recognition rate because; less information from the original pattern is retained.

The use of moment invariant features for identification and inspection of 2D shape has received much attention [20]. Various types of image moments, including geometrical, Legendre, Zernike, pseudo-Zernike, Fourier–Mellin, and complex, have been evaluated in terms of noise sensitivity, information redundancy, and capability of image description [21]. The result is the superiority of Zernike moments. Khotanzad and Hong [22] have also compared Hu’s moment invariants and the magnitudes of Zernike moments in rotational invariant recognition of characters and shapes. Their results clearly show that Zernike moments are superior to Hu’s moments. Invariant features of Zernike moments are sensitive to noise and when the number of features reaches a certain limit, classification accuracy may not increase with additional features. The accuracy of the moment invariant features may drop significantly, when the number of image classes increases.

This paper introduces an effective rotation and scale invariant handwritten word recognition based on M-band packet wavelet transform. The rotation and scale invariant feature of a word image involves applying a polar transform [23], [24] to eliminate the rotation and scaling effects, but at the same time produces M-row shifted polar image, which is then passed to a row shift invariant M-band wavelet packet transform to eliminate the row shift effects. So, the output wavelet coefficients are rotation and scale invariant. Then, for each subband of these wavelet coefficients a set of local energy features are computed. The local energy features were only computed for the coefficients of the last wavelet subband. We extracted feature vectors from subbands of wavelet coefficients for M = 2, 3 and 4, then we have three independent feature vectors for handwritten word classification and compare classification result for M = 2, 3 and 4. The proposed polar M-band wavelet feature has been well tested using a Mahalanobis classifier to classify a set of distinct natural handwriting Farsi words provided from different person with different handwriting style.

The outline of this paper is as follows: Next section briefly introduces and reviews the standard 2D M-band wavelet packets decomposition and filter bank design techniques. Section 3 introduces our proposed scheme for extracting the rotation and scale invariant polar M-band wavelet energy from any given image. Section 4 presents the classification results for different kinds of energy measures and different bands of energy features on handwritten testing data sets. The comparison of the classification performance and noise robustness of our proposed method in comparison to other rotation invariant handwritten word recognition methods is also brought out in Section 4. The final section draws the conclusions.

Section snippets

M-band packet wavelet transforms

The wavelet transform maps a function f(x)  L2(R) onto a scale–space plane. The wavelets are obtained from a single prototype function ψ(x) by scaling a and shifts b. The continuous wavelet transform of a function is given as:Wfa(b)=f(x)ψa,b(x)dxBoth the practical and mathematical interpretation of wavelet seem to best served by using the concept of resolution to define the effects of changing scale. To do this, we will start with a scaling function φ(t) rather than directly with the wavelet ψ(

Rotation and scale invariant M-band packet wavelet transform

Discrete M-band wavelet transform (DMWT) has been shown to be useful for image analysis [28], [26], [33], because wavelets have finite duration which provides both the frequency and spatial locality and efficient implementation. The hierarchical one-dimensional discrete M-band wavelet transform uses a set of M filters derived from wavelet functions to decompose the original signal into M subbands: details and approximation. The decomposition process is recursively applied to the approximation

Local feature extraction and classification

For HFWR, we need features for classification; these features are extracted from subband wavelet coefficients. Fig. 11 presents the stages of feature extraction. As shown in Fig. 11, the first stage, with M-band shift invariant packet wavelet, decomposes the word image into a number of subband images, the second stage extracts the local or global features from the subband images with one of the energy measurement defined in Table 2, the third stage stores the feature vectors in descending order

Experimental results

The effectiveness of the proposed HFWR has been well tested via several experiments using a set of handwritten word images, from 12 different people. We carried out two major experiments with the objectives: (1) investigating the HFWR performance on different energy measure, (2) comparing the proposed method with two other pattern recognition schemes. We prepared 100 different handwritten word images from 12 different people, and, generated from each one seven word images with different

Ali Broumandnia was born in Esfahan, Iran. He received the B.Sc. degree in Computer Hardware Engineering from the Esfahan University of Technology in Esfahan, Iran, in 1992, the M.Sc. degree in Computer Hardware Engineering from the Iran University of Science and Technology in Tehran, Iran, in 1995, and Ph.D. degree in Computer Engineering from the Islamic Azad University-Science and Research Branch in Tehran, Iran, in 2006.

From 1993 through 1995 he worked on intelligent transportation control

References (43)

  • S.J. Soltysiak, Visual information in word recognition: word shape or letter identities? in: Proceeding Workshop...
  • Tal Steinherz et al.

    Offline cursive script word recognition – a survey

    International Journal on Document Analysis and Recognition

    (1999)
  • M. Leung, A.M. Peterson, Scale and Rotation Invariant Texture Classification, in: Proceeding International Conference...
  • F.S. Cohen et al.

    Classification of rotated and scaled textured images using gaussian markov random field models

    IEEE Transaction on Pattern Analysis and Machine Intelligence

    (1991)
  • J.-L. Chen et al.

    Rotation and gray scale transform invariant texture identification using wavelet decomposition and hidden markov model

    IEEE Transaction on Pattern Analysis and Machine Intelligence

    (1994)
  • W.R. Wu et al.

    Rotation and gray-scale transform-invariant texture classification using spiral resampling, subband decomposition, and hidden Markov model

    IEEE Transaction on Image Processing

    (1996)
  • Chi-Man Pun et al.

    Extraction of shift invariant wavelet features for classification of image with different sizes

    IEEE Transaction on Pattern Analysis and Machine Intelligence

    (2004)
  • H. Xiong et al.

    A translation- and scale-invariant adaptive wavelet transform

    IEEE Transaction on Image Processing

    (2000)
  • A. Khotanzad et al.

    Classification of invariant image representations using a neural network

    IEEE Transaction on Acoustics, Speech, and Signal Processing

    (1990)
  • S.J. Perantonis et al.

    Translation, rotation, and scale invariant pattern recognition by high-order neural networks and moment classifiers

    IEEE Transaction on Neural Network

    (1992)
  • O. Rashkovskiy et al.

    Scale, rotation, and shift invariant wavelet transform

    Proceeding SPIE

    (1994)
  • Cited by (40)

    • Correlation and image moment approaches to analyze the Glagolitic script carved in stone tablets

      2013, Optik
      Citation Excerpt :

      Among other types of image moments, Zernike moments are commonly used, since they are rotationally invariant by definition, while translation and scale invariance are relatively easily achieved by normalization. With these properties Zernike moments were successfully applied in recognition of alphabet [20], Farsi [21], Chinese [22] and Arabic [23] characters. Although more complicated than Zernike moments, the Fourier–Jacobi (α = β = 4) moments are particularly interesting, since they make set of image features robust to object deformations and noise [24].

    • Binary segmentation algorithm for English cursive handwriting recognition

      2012, Pattern Recognition
      Citation Excerpt :

      Therefore, segmentation based approach is more practical than the holistic approach for real world problems. Segmentation of off-line cursive words into characters is one of the most difficult processes in handwriting recognition and it is also defined as one of the most important processes because it directly affects the result of recognition process [1–9]. Researchers in the field have found that over-segmentation and validation (OSV) techniques produce promising results because heuristic over-segmenter tends to find all possible character boundaries and the excessive segmentation points created by the over-segmenter are removed from the final segmentation points through validation process.

    View all citing articles on Scopus

    Ali Broumandnia was born in Esfahan, Iran. He received the B.Sc. degree in Computer Hardware Engineering from the Esfahan University of Technology in Esfahan, Iran, in 1992, the M.Sc. degree in Computer Hardware Engineering from the Iran University of Science and Technology in Tehran, Iran, in 1995, and Ph.D. degree in Computer Engineering from the Islamic Azad University-Science and Research Branch in Tehran, Iran, in 2006.

    From 1993 through 1995 he worked on intelligent transportation control with image processing and designed the Automatic License Plate Recognition for Tehran Control Traffic Company. Since 1996, he has been with the Department of Computer Engineering, Islamic Azad University-South Tehran Branch, where he is currently an Assistance Professor. Since 2003, he has been with Department of Electrical, Computer Engineering and Information Technology, Islamic Azad University-Qazvin Branch, where he is currently a lecturer. He has published over 30 computer books, journal and conference papers. He is interested in Persian/Arabic character recognition and segmentation, Persian/Arabic document segmentation, medical imaging, signal and image processing, and wavelet analysis. He is reviewer of some International journals and conferences.

    View full text