Recognition of Hindi (Arabic) Handwritten Numerals

Recognition of handwritten numerals has been one of the most challenging topics in image processing. This is due to its contributions in the automation process in several applications. The aim of this study was to build a classifier that can easily recognize offline handwritten Arabic numerals to support those applications that are deal with Hindi (Arabic) numerals. A new algorithm for Hindi (Arabic) Numeral Recognition is proposed. The proposed algorithm was developed using MATLAB and tested with a large sample of handwritten numeral datasets for different writers in different ages. Pattern recognition techniques are used to identify Hindi (Arabic) handwritten numerals. After testing, high recognition rates were achieved, their ranges from 95% for some numerals and up to 99% for others. The proposed algorithm used a powerful set of features which proved to be effective in the recognition of Hindi (Arabic) numerals.


INTRODUCTION
The development of Optical Character Recognition system OCR is considered one of the most important fields of research areas in pattern recognition.OCR allows a machine to automatically recognize characters through an optical mechanism.In other words, it is electronic translation for the images of handwritten numerals into computer textual format.
Recently, the recognition of handwritten numerals becomes an intensive area of research; in order to increase the functionality of OCR system.Numeral recognition systems can be utilized in several applications such as: check verification in banks, office automation, postal address reading and communication technology.
There are several approaches that deal with numerals/characters recognition problem, each approach depends on a set of features to be extracted and the ways of extracting them.
Handwritten numerals recognition is a hard task due to the restricted shape variations (In size, shape, slant and the writing style) and the different kinds of noise that break the strokes in numbers or change their topology.That's why we can see that handwriting varies when a person writes the same character twice.One can expect enormous dissimilarity among people.Figure 1 shows a sample of standard and handwritten Hindi (Arabic) numerals.This study describes an off-line recognition technique for Arabic handwritten numerals by extracting features from numeric images to provide efficient and reliable results.The most important aspect of handwriting recognition scheme is the selection of a powerful set of features, which is reasonably invariant and robust with respect to the shape and slant variations that are caused by various writing styles.

MATERIALS AND METHODS
The recognition of handwritten text or number is a hard task because it depends on the writer and its accuracy.Thus, clear and accurate writing will help the OCR system to achieve very high recognition rates.
Hindi (Arabic) numerals are used by Arabs and Latin-based languages.Where, the term, Hindi (Arabic) numerals refer to the Indian numerals that are used in Arabic writing.

General Outline of the Proposed Approach
As depicted in Fig. 2, the proposed model is composed of four steps: importing numeral image, preprocessing, 2 extracting features, classification and finally, recognizing the imported numeral.

Image Preprocessing:
This step starts by applying the preprocessing techniques to the imported image, the preprocessing step include a set of operations: binarization, removing noise, edge detection.Figure 3 shows an example for imported numeral image and its preprocessing result.

Feature Extraction
Checking the existence of loops: Several techniques can be applied to detect loop in an image.In this study, the technique that is proposed by Kim et al. (2009) was applied.
Finding the Centroid of the image: the centroid is a useful feature that is used to describe the central weight of objects in image.Centroid is calculated by computing: {Mean(X), Mean(Y)}, where X and Y are the pixel's coordinates of the Numeral image.
Image Segmentation: is the process of partitioning a digital image into multiple parts or sub images.Segmentation is used to simplify and/or change the representation of an image into something that is more meaningful to analyze.More precisely, in this study, we suggest to divide the numeral image into two parts (sub images) according to its centroid value.An example of partitioning is shown in Fig. 4.
Horizontal projections: Another feature is suggested in this study, for each sub image the horizontal projection (projection on x-axis) is determined.Figure 5 shows examples of numeral projections.

Image Classification
In this step, the resulted features (loops or projections) are used to recognize the numeral.This is achieved by comparing the resulted features with the features of standard Hindi (Arabic) numerals as shown in Fig. 6 and 7 respectively.detected number is one of the following {"five", "zero", "nine"} • If the number is a filled loop then it is zero, while, if it contains a shallow loop then it is either "five or nine".So, to distinguish between them, "Nine" contains a line and a shallow loop, while "Five" is only a shallow loop as explained in  Actually, by applying these steps good results were achieved, but sometimes an error may occur in detecting number "three".It sometimes detected as "two" this is according to the position of the centroid point, as shown in Fig. 8.
Thus, to increase the robustness of the system, we propose that, if the projection's result of your numeric image is the same as of the number "two", try to insure that the number is correctly detected.So, re-apply the steps 6 and 7 for the upper sub image only.So, if it results in projections like those in Fig. 9, then it is "three" but if it doesn't then it is truly detected as "two".

RESULTS
The experiments were applied over a collection of Hindi (Arabic) handwritten numerals which collected from a large number of people in different ages; to test the proposed model.All the experiments are implemented under Matlab environment.
The results of the proposed method were highly accurate; it reaches high recognition rates for several samples as shown in Table 1.

DISCUSSION
In this study we describe a new approach to off-line, handwritten numeral recognition.There are a lot of problems for recognition due to writing habits and instruments; we suggest a recognition method which is able to account for a variety of distortions due to eccentric handwriting.
Various methods have been proposed and high recognition rates are reported, for the recognition of English handwritten digits (Berkes, 2005;Liu et al., 2004;Kussul and Baidyk, 2004;Tang, 2006).In recent years, many researchers have addressed the recognition of Arabic text, including Arabic numerals (Al-Omari and Al-Jarrah, 2004;Bouslama, 1999;Salourn, 2001;Salah et al., 2002;Alma'adeed et al., 2004;Touj et al., 2005).Alfonse et al. (2010), presented a hybrid classifier for segmenting Arabic numerals.The classifier is built using both of the Multilayer neural networks and the decision trees.They reach accuracy about 83% (Alfonse et al., 2010).Mahmoud and Awaida (2009), suggested a technique for automatic off-line handwritten Arabic (Indian) numerals recognition, by using Support Vector Machines and Hidden Markov Models.They achieved average recognition rates about 99.83% and 99.00% using, the Support Vector Machines and Hidden Markov Model classifiers respectively (Mahmoud and Awaida, 2009).

AJEAS
Mahmoud and Abu-Amara (2010a; 2010b) proposed a technique for the recognition of off-line handwritten Arabic numerals using Radon and Fourier Transforms.They reach high recognition rates around 98% (Mahmoud and Abu-Amara, 2010a;2010b).

CONCLUSION
In this study a robust algorithm for offline Hindi (Arabic) numerals recognition is proposed.Its robustness comes from the set of extracted features.In summary, the proposed model starts by extracting a set of features like: detecting the loops or dividing the numeral image according to its centroid point position, finally classify the number according to the shape of the horizontal projection, or the existing of loops.The experimental results of this model show high accuracy and recognition rates around 98% among all numerals.

Fig. 9 .
Fig. 7.The features of the standard Hindi (Arabic) numerals that contains loops

Fig. 5 •
If no loops exist.Compute the centroid for the image.•Divide the image according to the centroid point.•Find the projection for the generated images • Finally, to recognize the number correctly, compare your projection results with the standard set of projections that are shown in Fig.6

Table 1 .
Detection rates for characters without secondary parts