Handwritten Arabic Characters Recognition Based on Wavelet Entropy and Neural Network

The presented work proposed a wavelet packet and Shannon entropy (SEWP) technique for handwritten Arabic characters recognition system. Entropy has been applied in many applications. However, the combination of Shannon entropy with wavelet transform (WT) is proposed in this study of handwritten Arabic characters recognition. The investigation procedure was based on feature extraction and classification. For feature extraction, the distinguished features of handwritten Arabic characters were extracted using the SEWP technique. And for classification, probabilistic neural network (PNN) was applied because of its better performance and speedy processing. In the experimental investigation, the quality of wavelet transform in conjunction with Shannon entropy were studied. In addition, the capability analysis on the proposed system was studied by comparing


INTRODUCTION
Recently, there has been increased an interest amongst researchers in problems related to machine simulation of the human reading process. Intensive research has been carried out in this area as is testified by an enormous number of technical papers and reports in the literature devoted to character recognition. This subject has attracted immense research interest not only because of the very challenging nature of the system but also because it provides the means for automatic processing of large volumes of data in postal code reading, office automation and other business and scientific applications. Optical character recognition (OCR) deals with the recognition of optically processed characters rather than magnetically processed ones. In a typical OCR structure, input characters are read and digitized by an optical scanner. Each character is then positioned and segmented and the resulting matrix is fed into a preprocessor stage for smoothing, noise elimination and size normalization. Off-line recognition can be considered the most general case: no special device is required for writing and signal interpretation is independent of signal generation, as in human recognition [1][2][3][4][5][6][7].
Many investigators have been working on cursive script recognition for more than three decades [8]. Nevertheless, the field remains one of the most challenging problems in pattern recognition and all the existing systems are still limited to restricted applications. The process by which people recognize handwritten characters, words and documents has been the subject of intense interest and study by researchers from diverse fields [9]. A good interpretation of the mechanism of human recognition of handwritten documents will have a significant impact on the development of machines capable of recognition and understanding of handwritten documents. However, the human recognition process is quite complex and it combines information extracted at different levels: characters, whole words, keywords and contextual processing [9,10]. The efficiency of human recognition of handwriting can be attributed to the effective integration of multiple cues and exploitation of redundancies contained in most documents. However, if the goal of this study is to develop machines that are capable of automatic transcription of handwritten documents, then one must recognize the immense difficulty of adopting the human recognition process. A document on practical approaches to handwriting recognition will include characters, words, phrases, sentences and whole paragraphs.
Handwriting recognition, generally speaking, is the dictation of a handwritten document into machine written text. In the literature we can find many approaches to handwriting recognition that can be summarized as: (a) techniques based on holistic approaches whereby an entire word or a character string is recognized as a unit and (b) techniques based on extraction and recognition of characters enclosed in a word or a string [9,8].
Several excellent literature review papers were appeared on Arabic character recognition [3,10]. Classical optical off-line recognition of handwriting was composed [7,11]. There is a small difference in the processes between Arabic language and Latin-alphabet-based languages [1,[4][5][6].
Segmentation of characters is a significant stage in character recognition for cursive writing for hand written and printed. Easily, we can find three strategies for segmentation: the standard approach in which segments identification is based on "character-like" possessions, the recognition-based segmentation strategy, in which the system examines the image for components that match classes in its alphabets and the third strategy is the holistic method, in which the system seeks to recognize words as a whole, thus avoiding the need to segment into characters [11]. Use of clustering technique was chosen for classification or tree representation for the description of various characters [12]. Use of tree representation and fuzzy constrained graph models that tolerate large varieties in writing styles were also reported [2].
Recognition of Arabic printed text was tried using structural feature extraction. SIFT approach for Arabic character recognition was proposed [13]. Many methods have been appeared in literature such as, Parallel Arabic OCR systems [14], Hidden Markov models [10] a system based on four types of basic features [15].
Neural Network was used in some literatures as a powerful classifier [16,17] proposed a textindependent approach and derived writer-specific texture features using multichannel Gabor filtering and (SGLD). The method requires uniform blocks of text that are generated by word de-skewing, setting a predefined distance between text lines/words and text padding. Two sets of 20 writers, 25 samples per writer were used in the evaluation. Nearest-centroid classification using weighted Euclidean distance and Gabor features achieved 96% writer identification rate. Eglin et al. applied the SGLD to extract several features to characterize the writing style of ancient Latin and Arabic manuscripts of the middle-ages. They proposed that The SGLD is identical on different text areas of the same document and is robust to noise and does not require any image segmentation or layout analysis. They reduced the features dimensionality by using Haralick descriptors. From these mixtures of features, they defined a ''style similarity'' measure and formed a large database of images samples of writings with a paleographic description to develop a reliable image retrieval system for medieval writings styles [7]. Proposed a large number of features divided into two categories-Macro-features operating at document/paragraph/word level and Micro-features operating at word/character level. Text dependent statistical evaluations are performed on a dataset containing 1000 writers who copied 3 times a fixed text of 156 words. The results have shown that micro-features are better than macro-features in identification tests with a performance exceeding 80%. An automation process of writer identification using scanned images of handwriting and thereby provided a computer analysis of handwriting individuality was proposed [10]. The relationship of handwriting styles between any two samples is computed by using proper distance measures between their corresponding feature vectors. The features and writer classification operate in the general framework of statistical pattern recognition. Joining texture-level and allographlevel features yields very high writer identification and verification performance, with usable rates for datasets containing 103 writers.
In this study wavelet entropy and probabilistic neural network combination, has been proposed for Arabic handwritten character recognition. The features have been extracted by means of entropy calculated for WP coefficients, which have been added to probabilistic neural network for classification.

MAIN CHARACTERISTICS OF ARABIC WRITING
Emphasis is a distinguishing feature of Semitic languages such as Arabic language. The term 'emphasis' refers to consonants generated with a secondary constriction in the subsequent vocal tract and a primary constriction typically in the dental/alveolar region [18]. Arabic text is written from right to left and is always cursive. The figure of an Arabic character alternate consistently with its location in the word. An Arabic character has up to four different shapes; the shape of a character depends on the type of character to its right and its position within the word. Table 1 shows the Arabic character set in the four different shapes. The Arabic character set is composed of 28 basic characters. Fifteen of them have dots and 13 are without dots. Dots above and below the characters play a major role in distinction some characters that differ only by the number or location of dots. Take the example the letters: ‫ن‬ ‫ي‬ ‫ث‬ ‫ت‬ ‫ب‬ . In their middle form, all these five letters are written the same way. They differ only by the number or the locations of the dots. There are four characters which may take another character "Hamzah ‫."ء‬ Those are "Alif," ‫أ‬ ‫",إ‬Waw ‫ؤ‬ " , "Yaa ‫ئ‬ " and "Kaf " ‫.ك‬ There are also some other secondary forms usedabove and below the characters to indicate vowels but we shall eliminate them now from our discussions. Arabic characters do not normally have fixed width or fixedsize, even in printed form.

Wavelet Packet Transform Feature Extraction Method
To decompose the data into wavelet packet transform (WPT), we start from the common form of the equivalent low pass of discrete time signal   (1) is customized as The speech signal model in (3) is the basic form of wavelet packet transform, which is used in signal decomposition. The signal is carried by orthogonal functions, which shape a wavelet packet composition in For a certain tree structure, the function n l  in (7) is called the constituent terminal function of 1 0  In this work, the tree consists of two stages and therefore has three high pass nodes and three low pass nodes.
The wavelet packet is used to extract additional features to guarantee higher recognition rate. In this study, WPT is applied at the stage of feature extraction, but these data are not proper for classifier due to a great amount of data length (for example, a speech signal of 35582 number of samples will reach 71166 after WPT decomposition at level two  [19,28,29]. In this paper, we use Shannon entropy obtained from WP tree nodes for feature vector constructing to be used for handwritten character recognition.

Wavelet Packet Entropy
For a given orthogonal wavelet function, a library of wavelet packet bases is generated. Each of these bases offers a particular way of coding signals, preserving global energy and reconstructing exact features. The wavelet packet is used to extract additional features to guarantee higher recognition rate. In this study, WPT is applied at the stage of feature extraction, but these data are not proper for classifier due to a great amount of data length. Thus, we have to seek for a better representation of the speech features. Previous studies showed that the use of entropy of WP as features in recognition tasks is efficient [9]. As seen above, the entropy of the specific sub-band signal may be employed as features for recognition tasks. In this paper, the Shannon entropy obtained from the WP will be employed for handwritten Arabic character recognition. The Shannon wavelet packet features extraction method can be summarized as follows:  Decomposing the each column of the gray form image by wavelet packet transform at depth 4 (level 4), with Daubechies type (db1).  Calculating the wavelet Shannon entropy for each column of all nodes of wavelet packet at depth 4 for wavelet packet using the equation (look Fig. 1): Where s is the signal and i s are the WPT coefficients. Entropy is a common concept in many fields, mainly in signal processing [23] Classical entropy-based criterion describes information-related properties for a precise representation of a given signal. Entropy is commonly used in image processing; it possesses information about the concentration of the image. On the other hand, a method for measuring the entropy appears as a supreme tool for quantifying the ordering of non-stationary signals. Fig. 1a shows Shannon entropy calculated for WP at depth 4 for three different handwritten Arabic characters. Figs. 1b and c show the features of similar and written Arabic characters. We can notice that the feature vector extracted by Shannon entropy is appropriate for handwritten Arabic characters recognition. This conclusion has been obtained by interpretation the following criterion: The feature vector extracted should possess the following [15], 1) Vary widely from class to class. 2) Stable over a long period of time. 3) Should not have correlation with other features.

PROPOSED PROBABILISTIC NEURAL NETWORKS ALGORITHM
Although there exist numerous enhanced versions of the original PNN, which are either more economical or exhibit an appreciably better performance, for simplicity of exposition [30,31], we adopt the original PNN for classification task (Fig. 2). The proposed algorithm denoted by PNN depends on the following construction: Handwritten character feature vectors (pattern) of a. b.
c. Fig. 1. The entropy features for a. different character features,  The SP parameter is a spread of radial basis functions. We use a SP value of one because that is a typical distance between the input vectors. If SP approaches zero, the network will act as a nearest neighbor classifier. becomes larger, the designed network will take into account several nearby design vectors create a two-layer network. The first layer has radial basis transfer function (RB) neurons (as shown in Fig. 3 470 WP sub-signals (7) parameter is a spread of radial basis We use a SP value of one because that is a typical distance between the input vectors. If SP approaches zero, the network will a nearest neighbor classifier. As SP work will take into account several nearby design vectors. We layer network. The first layer has radial basis transfer function (RB) neurons (as (8) s weighted inputs with Euclidean (9) its net input with net product functions, which calculate a layer's net input by combining The second layer ive transfer function (Fig. 4) neurons and calculates its weighted input with dot product weight function. It is Weight function applies weights to an input to get weighted inputs. The proposed net calculates its net input functions (called NETSUM) that calculate a layer's net input by combining its weighted inputs and biases. Only the first layer has biases. PNN set the first layer weights to X' and the first layer biases are all set to 0.8326/SP resulting in radial basis functions that cross 0.5 at weigh of +/-SP. The second layer weights are set to [27]. Now, we test the network on new feature vectors (testing) with our network to be classified. This process is called simulation. feature extraction vector is added to PNN to be classified as seen in the flow chart presented in Fig. 5.

RESULTS AND DISCUSSION
In the proposed work, testing Arabic handwritten characters as separated images and separated words were recorded via a scanner device. The Arabic handwritten characters and words were recorded by four Arab males. The recording process was provided in normal conditions. Our investigation of Arabic handwritten classifier system performance is performed via several experiments depending on different c types.

Structure of the original probabilistic neural network
, 2015; Article no. BJAST.2015.285 Weight function applies weights to an input to get weighted inputs. The proposed net calculates its net input functions lculate a layer's net input by combining its weighted inputs and biases. Only the first layer has biases. PNN sets and the first layer biases are all set to 0.8326/SP resulting in radial basis functions that cross 0.5 at weighted inputs SP. The second layer weights are set to P Now, we test the network on new feature vectors (testing) with our network to be classified. This process is called simulation. The obtained r is added to PNN to be classified as seen in the flow chart presented in

RESULTS AND DISCUSSION
In the proposed work, testing Arabic handwritten characters as separated images and separated words were recorded via a scanner device. The handwritten characters and words were recorded by four Arab males. The recording process was provided in normal conditions. Our investigation of Arabic handwritten classifier system performance is performed via several experiments depending on different classification Even though the methods proposed for Arabic handwritten characters by means of several approaches, they are still inadequate in terms of accuracy. In the approach we present in this paper, we propose research study of the Arabic handwritten characters classification in its two forms; separated letters and separated words that leads to the relatively comprehensive list of Even though the methods proposed for Arabic handwritten characters by means of several approaches, they are still inadequate in terms of accuracy. In the approach we present in this paper, we propose research study of the Arabic characters classification in its two forms; separated letters and separated words that leads to the relatively comprehensive list of the characters that are used in any Arabic text. In other words, the presented study may be considered as an investigation work aiming to build a system that identifies the handwritten Arabic text. We solve the problem by using conventional character recognition method (feature extraction and then classification). This helps greatly to find out sharp reference

Fig. 4. Competitive transfer function
the characters that are used in any Arabic text. In other words, the presented study may be work aiming to build a system that identifies the handwritten Arabic text. We solve the problem by using conventional character recognition method (feature extraction and then classification). This helps greatly to find out sharp reference difference between these Arabic handwritten characters to stand as a primary step for automatic Arabic handwritten text recognition. This approach is based on a combination of entropy and WT to accomplish feature extraction obtained for each character from binary image form. The obtained feature extraction vector is added to PNN to be classified as seen in the flow chart presented in Fig. 5. Table 2 tabulates the results of proposed method for separated Arabic handwritten characters. Training: Testing ratio was 90:10. The mean recognition rate (RR) reached 84.69%. A comparative study of the proposed feature extraction method with other methods was performed. In the next experiment, several feature extraction methods with PNN classifier were analyzed to expose the usefulness of proposed system in Arabic handwritten character recognition.
The following experiment investigates the proposed method in term of recognition rate. This can be concluded from interpretation of the results in Table 3, where the results of DWT at level five with Shannon entropy denoted earlier by (SEDWT), WP at level five with Shannon entropy denoted by (SEWP), DWT at level five with log energy entropy (LEDWT) and WP at level five with log energy entropy (LEWP) are tabulated. It was found that the recognition rates of WP methods are superior (83.1, 72.64) compared with DWT methods (81.96&64.11). In other hand, methods of Shannon entropy are superior (83.1, &81.96) compared with the log energy methods (72.64 & 64.11).
The proposed feature extraction method with segmentation the word into characters (Table 4) and the proposed feature extraction method with separated world recognition method (Table 5) are investigated. For all these methods, PNN classifier is utilized. The results were conducted for four words (written by 27 individual writers) as tabulated in Table 4 and Table 5. The best recognition rate selection obtained was 83.75 for separated word methods.

CONCLUSION
In the presented research paper, handwritten Arabic character recognition system based Shannon entropy and wavelet packet is studied.
The benefit of such method is its capability to reduce the data into a fewer values and the computing speed is also accomplished. At the beginning of feature extraction, WT is applied for handwritten Arabic character image (column by column). At the next step, Shannon entropy is obtained from the wavelet coefficients and used as a characteristic feature vector. For the classification, PNN is applied. The recognition performance of this method was demonstrated on an Arabic character database of total 27 individual writers.1108 different images were used in the experiments. Experimental results showed both DWT and WP linked with Shannon entropy are suitable for feature extraction method. However, WP produced better performance in term of recognition rate. The results of WP were improved by increasing the number of levels unlike DWT, which has no clear improvement, occurs by feature vector length extension.