Holistic Farsi handwritten word recognition using gradient features

In this paper we address the issue of recognizing Farsi handwritten words. Two types fo gradient features are extracted from a sliding vertical stripe which sweeps across a word image. These are directional and intensity gradient features. The feature vector extracted from each stripe is then coded using the Self Organizing Map (SOM). In this method each word is modeled using the discrete Hidden Markov Model (HMM). To evaluate the performance of the proposed method, FARSA dataset has been used. The experimental results show that the proposed system, applying directional gradient features, has achieved the recognition rate of 69.07% and outperformed all other existing methods.


Introduction
Due to being fraught with such difficulties as high variability of handwritten words' style and shape, uncertainty of human-writing, skew or slant writing, segmentation of words into characters ,and the size of lexicon, cursive handwritten word recognition has become a challenging area in pattern recognition [1].Farsi handwritten recognition, which this paper addresses, is very similar to Arabic in terms of strokes and structure.The only difference is that Farsi has four more characters.Therefore, Farsi word recognition system can also be used for Arabic words [2].Handwritten word recognition systems may work online or offline.In the online system, words are written using special tools like pen and tablet.In this way available information such as writing direction and writing speed facilitates the process of recognition.In the online method, handwritten recognition is occurs concurrently with writing [3,4,5], while in off-line method handwritten words exist prior to the recognition.The data were first collected using a pen and a paper and then scanned.Since in the offline method, additional tools for collecting data are not used, the recognition is more complex.In this paper we address the problem of recognizing Farsi handwritten words in an offline manner.In recent years, several studies have been conducted in this area, as reported below.In [6] a holistic system for recognition of Farsi/Arabic handwritten words using the rightleft discrete Hidden Markov Models (HMM) was proposed.The histogram of chain-code directions for vertical stripes on word image was used as feature vector [6].The extracted feature vectors were used as the input data to the Kohonen selforganization vector quantization.A database including 17000 images of 198 words was used for evaluating this system.Finally by smoothing the probabilities of observation symbols, recognition rate of 65% was achieved [6].In [7] Dehghan et al generated a fuzzy codebook using the same feature vectors.The recognition rate of 67% was reported for this system, using an HMM classifier [7].In [2], a method for recognizing handwritten Iranian cities' name was proposed.In this method K-means clustering was used for vector quantization and words were modeled using the discrete HMM.For this method a recognition rate of 80.75% was reported [2].In [8] the authors proposed an off-line Arabic/Farsi handwritten recognition algorithm using RBF network.The features utilized in this method were wavelet coefficients being extracted from the profiles of smoothed word image in four standard directions.A database including 3300 images of 30 common Farsi names was used to evaluate this system.This method achieved 96% recognition rate.AlKhateeb et al [9] dealt with the problem of offline Arabic word recognition.They used a set of intensity features to train an HMM classifier for each word.The results were re-ranked using structure-like features (including a number of subwords and diacritical marks) to improve recognition.This method achieved 89% recognition rate using re-ranking.Imani et al [10] used chain-code and distribution of stoke pixels across a sliding frame as a hybrid method for feature extraction stage, then they used an HMM for classification.This method was applied on FARSA database [11] and a recognition rate of 68.88% was achieved.In this paper we addressed the off-line Farsi handwritten words recognition problem.To do so, we extracted gradient features from a sliding window, which sweeps across the word image.In this paper, we proposed two recognition systems.In one of the systems, the magnitude of image gradient was used and in the other one, the direction of image gradient was utilized in order for representing the word image.In either of the two systems, the feature vector, being extracted from a sliding window, was quantized through using a vector quantization algorithm.Then, for each word class some sequences of code indexes, (that were taken from its sample image), from the sample images of that class were used to train its discrete Hidden Markov Model.As it was shown in the experimental results, directional gradient features outperform intensity gradient features in terms of recognition rate.In fact, the intensity gradient features are performed in typewritten recognition in [12].However using features based on the direction of script gradient improves the recognition rate.FARSA is an appropriate dataset of handwritten Farsi words which was introduced by Imani et al [11].We evaluate the proposed methods in this paper using FARSA database.The rest of the paper is organized as follows.In the next section, the proposed word recognition system will be introduced.In section 3, the employed classification features are explained.The vector quantization for feature coding vector is presented in section 4. In section 5, the system's classification method will be described.The experimental results are reported in section 6.Finally, the paper will be drawn to a conclusion in section 7.

Preprocessing
The pre-processing stage plays a very crucial role in word recognition in order to represent various samples of a word in an invariant manner [3].The stage consists of the following steps:  Binarization and noise removal: The gray level image of a word is binarized by thresholding.Then those connected components with an area less than the predefined threshold are deleted.
 Cropping: In order to increase the processing speed and to decrease memory usage, the binary image is cropped by its bonding box.Then, the image is resized by 45 pixels in height.
 Skeletonization and dilation: First, the word image skeleton is extracted (Figure 2.b).Then, it is dilated (Figure 2.c) by a 4×4 square structure element.This step aims at making the system independent of stroke width [3].

Extracting feature from word image
In order to represent the word image as a pattern to the recognition system we need to represent it using discriminative features.We use a number of white pixels, horizontal and vertical edges, together with the histogram of gradient directions,

Sweep word image using sliding stripe
A sliding stripe is used to scan word image from right to left.Accordingly each word image is divided into several narrow windows with an overlap of 50%.Using this strategy, word image can be represented as a sequence of script primitives [10].Each window is twice as wide as each stroke in the word image as suggested in [6].Thus, in this study we widen the window to 8 pixels and heighten it to the word image height.A set of simple features are extracted from pixels falling within that window.In order to extract proper features from each window of the word image, the window is divided horizontally into a number of zones.In this paper two types of gradient features are extracted from each window zone.

Gradient feature extraction
By applying a low pass Gaussian filter, we convert the binary image of Farsi script into a gray scale image.Here we made use of horizontal and vertical Sobel operators to determine the gradient of word image in x and y directions namely   and  .These two Sobel operators are shown in figure 3. We also calculated the gradient's direction: The intensity feature represents the number of white pixels within each sliding window cell.The number of horizontal and vertical edges is counted in each sliding window cell [12].Accordingly, there are 3 features for each cell and each sliding window has 9 horizontal cells, so 27feature values for each sliding window are extracted.

Gradient direction as feature
In (1), (, )returns the direction of gradient vector (  ,   )in the range of [-π, π].The gradient direction at each pixel is quantized to 8 intervals of π/4 each (Figure 5).Each sliding window, as introduced in the previous section, is divided into 5 horizontal cells whose histograms are shown in four normalized directions.Thus we represent a sliding window on image word using a feature vector including 20 components.

Vector quantization
In this research for 198 classes of database, 533000 extracted frame from the training set is used as input to the Kohonen Self Organization vector quantization (SOM in Neural Network MATLAB Toolbox) to obtain a codebook with 49 symbols.After generating the codebook a given feature vector is mapped to a symbol from 1 to 49, which is the closest code word by the Euclidean distance measure.The histogram distribution of feature vectors in codebook has been shown in figure 6.Thus, each word image is now identified by an observation sequence.The sequences are given as input to the Hidden Markov Models.

Hidden Markov Model classification
The Hidden Markov Models (HMMs) are widely used for text recognition.The HMMs are statistical models being originally used for speech recognition efficiency.Since the HMM has put in a good performance in speech recognition, and because of the similarities between speech recognition and cursive handwriting, it has been extended to Farsi handwriting word recognition [3,13,14].HMM can be represented using three parameters as follows: λ stands for an HMM model, π is the vector of the initial state probabilities, A is the state transition matrix and B is the matrix of observation symbol probabilities: where, a ij is the probability that system at time t is in j th state assuming that system at time t-1 was in i th state.In ( 5 , for optimization [9].It is well known that if adequate training data is not provided, the HMM parameters especially the observation symbol probabilities, are usually poorly estimated.As a result, the recognition rate is degraded even if a very slight variation in the testing data occurs.A proper smoothing of the estimated observation probability can overcome this problem without a need for more training data.The parameter smoothing method as proposed in [16] where, the weighting coefficient  (,),(,) defined as the function of distance between two nodes ) , ( q p and ) , ( l k in the map:  (,),(,) = . ( (,),(,) −1) ( (c) Is a constant chosen to be equal to 0.5.The smoothing factor (Sf) controls the degree of smoothing, and (,),(,) is the hexagonal distance between two nodes with the coordinates ) , ( l k and ) , ( q p in the codebook map.In testing stage, an observed sequence of test images is given to all of the HMM models to find the best model that can generate the data.Viterbi's algorithm is used to match a single model to the observed sequence of symbols.The reader is referred to [17] for more details about HMM.

Experimental results
In this section, firstly, the collected database, namely FARSA, is introduced.Then, the results of applying the proposed method to the database are reported.

FARSA database
Farsi scripts have four more characters than the Arabic ones in their character set.Therefore, the most accurate recognition result can be obtained only by using the proper dataset for the language [18].So the standard Arabic database cannot be used for Farsi and there is no proper handwritten database available in Farsi.
A proper database must include a significant number of samples for each class of words in the dictionary.We prepare a database including 30000 images of 300 formal words which are common in Farsi Language.The handwritten words are scanned with 300-dpi resolution and 256 gray levels.The database is called FARSA [11].

Experiments and results
In this paper, 198 word classes of the FARSA were used to evaluate the proposed system.The number of word classes was chosen to compare the proposed method with the methods in [6,10] which had used the same number of word classes.A subset including 19800 samples of the images in the FARSA database was chosen.Out of which about 70% of samples were chosen randomly as the training database and the rest were utilized for the test.All of the HMM models are initialized using the same parameters.The performance of the word recognition system is illustrated in table1 by a topn recognition rate (the percentage of test words recognized as true class lies among the first n positions in the candidate list).The criterion for ranking of a HMM model is the log-likelihood of a given test image being produced by the HMM model.Table 1 reports the results achieved through comparing two types of gradient feature extraction.As can be seen, directional gradient features are more appropriate than intensity gradient features for handwritten word recognition.The amplitude of gradient performed very well as feature for typewritten recognition in [12].But in handwritten, due to a variety of text and change the font stretch as seen in table 1, the directional gradient features work better.As shown in table 1 the recognition rate using the angle of gradient improves the recognition rate about 16%.As mentioned previously, due to the variation of handwritten words in a class and because of the limited number of training data, the recognition rate is not considerable.The proposed method was repeated after smoothing the observation probabilities of the HMMs.As reported in [10] the appropriate smoothing factor is 0.001.In this experiment, we report the results of smoothing the HMM parameters with proper smoothing factor in table1.In another experiment, we consider three codebook sizes.Figure 7 shows the results of an experiment in which directional gradient feature extraction method is adopted.As can be seen, the recognition rate of a codebook being 49 in size outdoes that of the one being 36 in size.But the results for 49 and 64 are almost identical.On the other hand, figure 8 shows that the computational complexity and training system time for 64 is much higher than 49.So we choose 49 as the codebook size.On the other hand, figure 8 shows that the computational complexity and training system time for 64 is much higher than 49.So we choose 49 as the codebook size.Table 2 shows that the performance of the proposed word recognition systems compared with three existing methods [6,7,10].
In an earlier work [10], a combination of chain code histogram features and intensity features were used during the extraction stage and the feature vectors were 25 in length.
It can be seen that, despite in this work we use less features in compare with an earlier work [10], the performance has been improved.

Conclusion
We proposed an offline system for recognizing of Farsi handwritten words.Directional and intensity gradient features were employed.The extracted feature vector was coded using a vector quantization algorithm.The codes were utilized as an observation in order to train the HMM model for each word class.
Table2.Recognition rate of proposed method compared to other method.

Number of testing images
Top- The proposed system was evaluated using a newly prepared database, namely FARSA database.The experimental results for directional gradient features outperformed intensity gradient features and the results achieved through adopting the proposed method were far better than those achieved through employing the existing methods.
The word recognition system contains three stages: preprocessing, feature extraction and classification.The block diagram of the proposed Farsi handwritten word recognition is demonstrated in figure 1.

Figure1.
Figure1.Block diagram of the proposed system.
h i v e o f S I D Imani et al./ Journal of AI and Data Mining, Vol 4, No 1, 2016.

Figure2.
Figure2.An example of skeletonization and dilation.a) Original image.b) Word skeleton.c) Dilation on word skeleton.

Figure3. 3 . 3 . 4
Figure3.Sobel masks, (a) horizontal mask and (b) vertical mask.3.3.Using intensity and magnitude of gradient components as a feature setIn order to describe image inside a sliding window, the window is divided into 9 horizontal cells and 3 features are extracted from each individual cell.The features are: image intensity, horizontal component, and vertical components extracted from the Sobel operators (Figure4).

Figure 5 .
Figure 5. Quantizing the gradient direction with four symbols.

)
= {  |  = (  = | −1 = )} (4)  = {  (  )|  (  ) = (  =   |  = )} (5) ), b j (o k ) is the probability that we observe o k at time t,o t = o k , assuming that system at this time is in j th state.Here, N equals to the number of states, T to the length of the sequence of observations to the number of possible observations (form the training set), and S = {s|1 ≤ s ≤ N}[15].To design an HMM classifier, several procedures are needed to be performed including (i) deciding the number of states and observations,(ii) choosing HMM topology, (iii) model training using selected samples and (iv) testing and evaluation[9].The number of chosen states is proportioned to the word length.Indeed for each word, the number of states is considered proportional to the minimum number of frames in the word picture.In this paper, a right-to-left HMM is employed.Each state could have a self-transition, or a transition to the next or two next states.In the training stage, the model is optimized using the training data through an iterative process.The Baum-Welch algorithm, a variant of the Expectation Maximization (EM) algorithm, is utilized to maximize the observation sequence probability )

Figure7.
Figure7.Comparison of recognition results with different codebook size.

Figure8.
Figure8.Training system time for different codebook size.
is used.After training all of the HMMs by the Baum-Welch algorithm, the value