Persian/arabic handwritten word recognition using M-band packet wavelet transform
Introduction
With the progress in information technology and the increased call for information, the amount of documents containing information has increased more and more. Despite the use of electronic documents, the amount of printed or handwritten documents has never decreased. However, it is very complicated and cumbersome to store and retrieve the ever increasing documents. On the other hand, electronic documents have several advantages in storage, retrieval, search and updating. Document image analysis covers the algorithms on transforming documents into electronic format.
Document image analysis has been the topic of research for almost three decades. A large number of research papers and reports have already been published on Latin, Chinese and Japanese characters. However, little work, mostly because of the cursive nature of Farsi writing rules, has been performed on automatic recognition of Farsi Documents. Therefore, this area of research is still an open field.
Document image analysis involves specifying the geometry of the maximal homogeneous regions and classifying them into text, image, table, drawing, etc., and then images are compressed, tables are reconstructed, and drawings are vectorized. Text regions are segmented into isolated words and recognized. This paper focuses on the word recognition part of document image analysis.
There are two approaches for word recognition; analytical and holistic. The first one treats a word as a collection of simpler subunits such as characters and proceeds by segmenting the word into these units, identifying the units and building a word-level interpretation using the lexicon. The second one treats the word as a single, indivisible entity and attempts to recognize it using features of the word as whole [1], [2], [3], [4]. This approach is referred to as the word-based or holistic and is inspired in part by psychological studies of human reading, which indicate that humans use features of word shape such as length, ascenders, and descanters in reading (Fig. 1). Holistic strategies employ top-down approaches for recognizing the whole word. This eliminates the segmentation problem.
This paper presents a novel holistic scheme for handwritten Farsi word recognition (HFWR) via packet wavelet transform. Some applications of HFWR are the recognition of handwritten check amounts, postal address, forms, license plate, and automatic filing of faxes. The handwritten text must be located, extracted, made free of artifacts, separated into lines if necessary, and, finally, into individual words before recognition. These steps are generally nontrivial. This paper assumes that the complex task of segmenting handwritten word or phrase of interest from its surroundings has already been accomplished by prior processes. The tasks of segmentation and recognition are generally accomplished sequentially based on packet wavelet features of the word image. This paper focuses on the recognition of the isolated word or phrase using the appropriate lexicon. Fig. 2 depicts an example where a word has been passed to the recognition system and based on the lexicon; the possible candidates for the input are the outputs of the system.
This paper proposes an effective holistic scheme for rotation and scale invariant handwritten word recognition using M-band packet wavelet transform. This approach utilizes M-band packet wavelet transform to extract feature vector for Farsi word image. The global and local features of the word image are exploited for recognizing the limited-size lexicon of handwritten Farsi words. As the size of the lexicon gets larger, the complexity of algorithms increases linearly due to the need for a larger search space and a more complex pattern representation. Additionally, the recognition rates decrease rapidly due to the decrease in between class variances in the feature space.
The extraction of rotation and scale invariant features is always an important problem in content-based image analysis. Proposals can be found in the literature that specifically addresses the problem of rotation and scale invariant image analysis [5], [6], [7], [8], [9], [10], [11], [12], [13], [14]. Chen and Bui [15] proposed an invariant Fourier-wavelet descriptor for Chinese character recognition. But, their performance may drop significantly for databases of more general images. Several authors [16], [17], [18], [19] extract 1D invariant features from 2D patterns. The advantage of this approach is that we can save space for the database and reduce the time for matching through the whole database. The drawback is the low recognition rate because; less information from the original pattern is retained.
The use of moment invariant features for identification and inspection of 2D shape has received much attention [20]. Various types of image moments, including geometrical, Legendre, Zernike, pseudo-Zernike, Fourier–Mellin, and complex, have been evaluated in terms of noise sensitivity, information redundancy, and capability of image description [21]. The result is the superiority of Zernike moments. Khotanzad and Hong [22] have also compared Hu’s moment invariants and the magnitudes of Zernike moments in rotational invariant recognition of characters and shapes. Their results clearly show that Zernike moments are superior to Hu’s moments. Invariant features of Zernike moments are sensitive to noise and when the number of features reaches a certain limit, classification accuracy may not increase with additional features. The accuracy of the moment invariant features may drop significantly, when the number of image classes increases.
This paper introduces an effective rotation and scale invariant handwritten word recognition based on M-band packet wavelet transform. The rotation and scale invariant feature of a word image involves applying a polar transform [23], [24] to eliminate the rotation and scaling effects, but at the same time produces M-row shifted polar image, which is then passed to a row shift invariant M-band wavelet packet transform to eliminate the row shift effects. So, the output wavelet coefficients are rotation and scale invariant. Then, for each subband of these wavelet coefficients a set of local energy features are computed. The local energy features were only computed for the coefficients of the last wavelet subband. We extracted feature vectors from subbands of wavelet coefficients for M = 2, 3 and 4, then we have three independent feature vectors for handwritten word classification and compare classification result for M = 2, 3 and 4. The proposed polar M-band wavelet feature has been well tested using a Mahalanobis classifier to classify a set of distinct natural handwriting Farsi words provided from different person with different handwriting style.
The outline of this paper is as follows: Next section briefly introduces and reviews the standard 2D M-band wavelet packets decomposition and filter bank design techniques. Section 3 introduces our proposed scheme for extracting the rotation and scale invariant polar M-band wavelet energy from any given image. Section 4 presents the classification results for different kinds of energy measures and different bands of energy features on handwritten testing data sets. The comparison of the classification performance and noise robustness of our proposed method in comparison to other rotation invariant handwritten word recognition methods is also brought out in Section 4. The final section draws the conclusions.
Section snippets
M-band packet wavelet transforms
The wavelet transform maps a function f(x) ∈ L2(R) onto a scale–space plane. The wavelets are obtained from a single prototype function ψ(x) by scaling a and shifts b. The continuous wavelet transform of a function is given as:Both the practical and mathematical interpretation of wavelet seem to best served by using the concept of resolution to define the effects of changing scale. To do this, we will start with a scaling function φ(t) rather than directly with the wavelet ψ(
Rotation and scale invariant M-band packet wavelet transform
Discrete M-band wavelet transform (DMWT) has been shown to be useful for image analysis [28], [26], [33], because wavelets have finite duration which provides both the frequency and spatial locality and efficient implementation. The hierarchical one-dimensional discrete M-band wavelet transform uses a set of M filters derived from wavelet functions to decompose the original signal into M subbands: details and approximation. The decomposition process is recursively applied to the approximation
Local feature extraction and classification
For HFWR, we need features for classification; these features are extracted from subband wavelet coefficients. Fig. 11 presents the stages of feature extraction. As shown in Fig. 11, the first stage, with M-band shift invariant packet wavelet, decomposes the word image into a number of subband images, the second stage extracts the local or global features from the subband images with one of the energy measurement defined in Table 2, the third stage stores the feature vectors in descending order
Experimental results
The effectiveness of the proposed HFWR has been well tested via several experiments using a set of handwritten word images, from 12 different people. We carried out two major experiments with the objectives: (1) investigating the HFWR performance on different energy measure, (2) comparing the proposed method with two other pattern recognition schemes. We prepared 100 different handwritten word images from 12 different people, and, generated from each one seven word images with different
Ali Broumandnia was born in Esfahan, Iran. He received the B.Sc. degree in Computer Hardware Engineering from the Esfahan University of Technology in Esfahan, Iran, in 1992, the M.Sc. degree in Computer Hardware Engineering from the Iran University of Science and Technology in Tehran, Iran, in 1995, and Ph.D. degree in Computer Engineering from the Islamic Azad University-Science and Research Branch in Tehran, Iran, in 2006.
From 1993 through 1995 he worked on intelligent transportation control
References (43)
- et al.
Newspaper layout analysis incorporating connected component separation
Image and Vision Computing
(2004) - et al.
Classification and segmentation of rotated and scaled textured images using texture ‘tuned’ masks
Pattern Recognition
(1993) - et al.
Invariant Fourier-wavelet descriptor for pattern recognition
Pattern Recognition
(1999) - et al.
Wavelet descriptors for multiresolution recognition of handprinted characters
Pattern Recognition
(1995) - et al.
Invariant pattern recognition by moment Fourier descriptor
Pattern Recognition
(1994) - et al.
A survey of moment-based techniques for unconcluded object representation and recognition
CVGIP: Graphical Models Image Process
(1992) - et al.
Rotation invariant image recognition using features selected via a systematic method
Pattern Recognition
(1990) - et al.
Orthonormal shift-invariant wavelet packet decomposition and representation
IEEE Transaction on Signal Processing
(1997) - et al.
Invariant character recognition with Zernike and orthogonal Fourier–Mellin moments
Pattern Recognition
(2002) - M.A. O’Hair, M. Kabrisky, Recognizing whole words as single symbols, in: Proceeding First International Conference...
Offline cursive script word recognition – a survey
International Journal on Document Analysis and Recognition
Classification of rotated and scaled textured images using gaussian markov random field models
IEEE Transaction on Pattern Analysis and Machine Intelligence
Rotation and gray scale transform invariant texture identification using wavelet decomposition and hidden markov model
IEEE Transaction on Pattern Analysis and Machine Intelligence
Rotation and gray-scale transform-invariant texture classification using spiral resampling, subband decomposition, and hidden Markov model
IEEE Transaction on Image Processing
Extraction of shift invariant wavelet features for classification of image with different sizes
IEEE Transaction on Pattern Analysis and Machine Intelligence
A translation- and scale-invariant adaptive wavelet transform
IEEE Transaction on Image Processing
Classification of invariant image representations using a neural network
IEEE Transaction on Acoustics, Speech, and Signal Processing
Translation, rotation, and scale invariant pattern recognition by high-order neural networks and moment classifiers
IEEE Transaction on Neural Network
Scale, rotation, and shift invariant wavelet transform
Proceeding SPIE
Cited by (40)
Correlation and image moment approaches to analyze the Glagolitic script carved in stone tablets
2013, OptikCitation Excerpt :Among other types of image moments, Zernike moments are commonly used, since they are rotationally invariant by definition, while translation and scale invariance are relatively easily achieved by normalization. With these properties Zernike moments were successfully applied in recognition of alphabet [20], Farsi [21], Chinese [22] and Arabic [23] characters. Although more complicated than Zernike moments, the Fourier–Jacobi (α = β = 4) moments are particularly interesting, since they make set of image features robust to object deformations and noise [24].
A new approach to detect and extract characters from off-line printed images and text
2013, Procedia Computer ScienceBinary segmentation algorithm for English cursive handwriting recognition
2012, Pattern RecognitionCitation Excerpt :Therefore, segmentation based approach is more practical than the holistic approach for real world problems. Segmentation of off-line cursive words into characters is one of the most difficult processes in handwriting recognition and it is also defined as one of the most important processes because it directly affects the result of recognition process [1–9]. Researchers in the field have found that over-segmentation and validation (OSV) techniques produce promising results because heuristic over-segmenter tends to find all possible character boundaries and the excessive segmentation points created by the over-segmenter are removed from the final segmentation points through validation process.
Recognition of Persian/Arabic Handwritten Words Using a Combination of Convolutional Neural Networks and Autoencoder (AECNN)
2022, Mathematical Problems in Engineering
Ali Broumandnia was born in Esfahan, Iran. He received the B.Sc. degree in Computer Hardware Engineering from the Esfahan University of Technology in Esfahan, Iran, in 1992, the M.Sc. degree in Computer Hardware Engineering from the Iran University of Science and Technology in Tehran, Iran, in 1995, and Ph.D. degree in Computer Engineering from the Islamic Azad University-Science and Research Branch in Tehran, Iran, in 2006.
From 1993 through 1995 he worked on intelligent transportation control with image processing and designed the Automatic License Plate Recognition for Tehran Control Traffic Company. Since 1996, he has been with the Department of Computer Engineering, Islamic Azad University-South Tehran Branch, where he is currently an Assistance Professor. Since 2003, he has been with Department of Electrical, Computer Engineering and Information Technology, Islamic Azad University-Qazvin Branch, where he is currently a lecturer. He has published over 30 computer books, journal and conference papers. He is interested in Persian/Arabic character recognition and segmentation, Persian/Arabic document segmentation, medical imaging, signal and image processing, and wavelet analysis. He is reviewer of some International journals and conferences.