Neural Network with Madaline for Machine Printed English Character Recognition

The recognition of optical characters is known to be one of the earliest applications of Artificial Neural Networks, which partially emulate human thinking in the domain of artificial intelligence. In this research the application of neural networks to the problem of identifying English machine printed characters in an automated manner is developed. A preprocessing step is implemented to separate each character from the others. After that a feature extraction process is applied on each character to obtain the minimum nodes by using Mean, Standard Deviation, and Variance. Madaline neural network is trained on a 26 alphabetical English characters with a standard font and size. And tested on these characters to verify each character image belongs to which type of character. This is done by using MATLAB®2008a.


Introduction
Neural networks have been applied to a wide variety of areas including speech synthesis, character recognition, diagnostic problems, medicine, business and finance, robotic control, signal processing, computer vision and many other problems that fall under the category of pattern recognition [1] . Neural networks have been shown to be particularly useful in solving problems where traditional artificial intelligence techniques involving symbolic methods have failed or proved inefficient. Such networks have shown promise in problems involving low-level tasks that are computationally intensive, including vision, character recognition, speech recognition, and many other problems that fall under the category of pattern recognition [2] .
One of the most classical applications of the Artificial Neural Network is the Character Recognition System. This system is the base for many different types of applications in various fields, many of which we use in our daily lives. [3] Cost effective and less time consuming, businesses, post offices, banks, security systems, and even the field of robotics employ this system as the base of their operations. When processing a check, performing an eye/face scan at the airport entrance, or even teaching a robot to pick up and object, a system of Character Recognition is employed [3] .
Character recognition, both human and machine generated, is a wide and largely studied field in Machine Learning. In fact, nowadays many commercial scanners use Optical Character Recognition (OCR) systems that output a character string having an image of typed text as input [4] .
For some application areas including pattern recognition, neural models show promise in achieving human-like performance over more traditional artificial intelligence techniques [4] .
In this research for machine printed character recognition system for English language applied, with standard font and size based on a Madaline neural network model, is developed and done by using Matlab software.
Some algorithms regarding the English character recognition deal with seven moments which is applied by so many OCR's [5] , also some approaches deal with neural network techniques and are in practical [6] . Later on the genetic algorithms are introduced to be used in the character detection field [7] .
Besides this introductory introduction, this research is organized as follows: Section 2 describes the principles of character recognition. The concepts of Neural Networks are introduced in Section 3. Section 4 contains a full description of the proposed algorithm. Results and comparisons are given in Section 5. Finally, Section 6 concludes this paper.

Character Recognition Principles
The field of character recognition is one of the major fields of the pattern recognition area which has been the subject of much research in the past three decades [8] .A Character Recognition or Optical Character Recognition (OCR), is the process of converting scanned images of machine printed or handwritten text (numerals, letters, and symbols), into a computer format text [9] .
There are two types of character recognition systems: on-line and off-line systems. Each system has its own algorithms and methods. The main difference between them is that in an on-line system the recognition is performed in the time of writing while the off-line recognition is performed after the writing is completed [10] . In this research, an off-line recognition system is used.
In an on-line recognition system, which is also referred to as real time or dynamic recognition, the machine recognizes the symbols as they are drawn. Within such system, the direction of drawing is important, and there is no need for skeletonization or contour extraction. This property makes the recognition stage of the on-line systems easier than it is in the off-line systems. The digitizing tablet is used as a writing surface in most on-line systems [11] [12] .
In off-line recognition systems, input text is read and digitized by an optical scanner. Each character is then located and segmented. The resulting array is fed into a preprocessor for smoothing, elimination of noise, size normalization and other operations, to facilitate the extraction of features in the subsequent stage [11] [12] .

The Concepts of Neural Networks
Neural networks are composed of simple elements operating in parallel. These elements are inspired by biological nervous systems. As in nature, the connections between elements largely determine the network function. You can train a neural network to perform a particular function by adjusting the values of the connections (weights) between elements. [13] Typically, neural networks are adjusted, or trained, so that a particular input leads to a specific target output. The next figure illustrates such a situation. There, the network is adjusted, based on a comparison of the output and the target, until the network output matches the target.
Typically, many such input/target pairs are needed to train a network. Neural networks have been trained to perform complex functions in various fields, including pattern recognition, identification, classification, speech, vision, and control systems. Neural networks can also be trained to solve problems that are difficult for conventional computers or human being.

The Proposed Algorithm
Figure (1) shows an overview of the proposed character recognition system steps applied in this research. The following subsections describe these steps.

Input image
Applying Feature extraction on each character by using:

Supervised training by using Madaline
Neural Network

Classification and Recognition
Network testing

Preprocessing
Preprocessing of the image is done to prepare it for other stage. It increases the accuracy of the recognizing algorithm by applying some of some techniques [14] .which increase the performance of algorithm to extract the features [15] .
After importing an image to the system, the preprocessing will begin by separating each line in single slip and the each slip [line] to be truncated to its characters so each character will be processed as single sub-image. The following steps will show the details for that:-

Line Segmentation
Between each two lines there is a free space and it is used to segment lines. Horizontal histogram of image is used to separate lines as shown in Figure (2).

Character segmentation
The printed English characters are separable, so, there is a space between each two characters in the word. In this paper vertical histogram is used to separate the characters as depicted in Figure (3).

Feature Extraction
Feature extraction is the process to represent the image by a suitable set of features [16] . When the input data to an algorithm is too large to be processed and it is suspected to be notoriously redundant (much data, but not much information) then the  Using vertical histogram to truncate characters input data will be transformed into a reduced representation set of features (also named features vector). Transforming the input data into the set of features is called features extraction. If the features extracted are carefully chosen it is expected that the features set will extract the relevant information from the input data in order to perform the desired task using this reduced representation instead of the full size input [17] .
The main purpose of the feature extraction process is to reduce the number of inputs into the classifier while maintains the important features or properties of the image [18] . The more meaningful of the features that are extracted, the better this feature extractor is. Additionally, the features must satisfy other desirable requirements such as fast processing speed, low computational cost and low complexity of the feature extraction techniques. Thus, simpler and more powerful features cannot be easily found [19] .
Some researchers apply resizing for each separated character after truncation applies resizing to fit into 25 x 25 image size. The new resized image is then converted to vector of data with 625 dimensions, this vector is sent to a module which extracts features of the data that minimize the number of input data of the neural network to 3 nodes by using variance (Var), standard deviation (Std) and Mean. The three inputs are calculated as follows:

A. Variance (Var)
The variance (σ 2 ), is defined as the sum of the squared distances of each term in the distribution from the mean (μ), divided by the number of terms in the distribution (N) as given in equation (1).
V = Var(X) returns the variance of X for vectors. For matrices, Var(X) is a row vector containing the variance of each column of X. For N-dimensional arrays, Var operates along the first non-singleton dimension of X. The result V is an unbiased estimator of the variance of the population from which X is drawn, as long as X consists of independent, identically distributed samples [20] .
Var normalizes V by N-1 if N>1, where N is the sample size. This is an unbiased estimator of the variance of the population from which X is drawn, as long as X consists of independent, identically distributed samples. For N=1, V is normalized by N.

B. Standard deviation (Std)
A standard deviation is a measure of dispersion around a central value. To compute the standard deviation, the sum of the squared differences between each individual data point and the average of all the data points is taken and then divided by the number of data points included (or, in the case of sample data, the number of data points included minus one). The square root of this value is then taken to obtain the standard deviation [20] .
The definition for the standard deviation S of a data vector X given in equation (2) .
where and n is the number of elements in the sample. S = Std(X), where X is a vector, returns the standard deviation. The result S is the square root of an unbiased estimator of the variance of the population from which X is drawn, as long as X consists of independent, identically distributed samples.
If X is a matrix, Std(X) returns a row vector containing the standard deviation of the elements of each column of X. If X is a multidimensional array, Std(X) is the standard deviation of the elements along the first non-singleton dimension of X [21] .

C. Mean (Average or mean value of array)
The average is defined as the sum of all data points divided by the number of data points included as shown in equation (3).
It is a measure of central tendency and is the most commonly used M = mean(X), returns the mean values of the elements along different dimensions of an array [21] .
If X is a vector, mean(X) returns the mean value of X. If X is a matrix, mean(X) treats the columns of X as vectors, returning a row vector of mean values. If X is a multidimensional array, mean(X) treats the values along the first non-singleton dimension as vectors, returning an array of mean values [21] .

The Madaline Neural Network
The Madaline (Many Adaline) is a multilayer extension of the single-neuron bipolar Adaline to a network. It is also due to B. Widrow (1988), its basic structure is given in Figure (4) which is in terms of two layers of Adalines, plus an input layer which merely serves as a network's input distributor [22] .

Madaline Training
Madaline training differs from Adaline training in that no partial desired outputs of the inside layers are or can be available. The inside layers are thus termed hidden layers. Just as in the human Central Nervous System (CNS), we may receive learning information in terms of desired and undesired outcome, though the human is not conscious of outcomes of individual neurons inside the CNS that participate in that learning, so in ANN no information of inside layers of neurons is available [22] .
The Madaline employs a training procedure known as Madaline Rule II, which is based on a Minimum Disturbance Principle, [23] as follows : 1. All weights are initialized at low random values. Subsequently, a training set of L input vectors xi (i = 1, 2, …, L) is applied one vector at a time to the input. 2. The number of incorrect bipolar values at the output layer is counted and this number is denoted as the error e per a given input vector. 3. For all neurons at the output layer: a. Denoting [th] as the threshold of the activation function (preferably 0), check: [z-th] for every input vector of the given training set of vectors for the particular layer that is considered at this step. Select the first unset neuron from the above but which corresponds to the lowest abs [z-th] occurring over that set of input vectors. Hence, for a case of L input vectors in an input set and for a layer of n neurons, selection is from n x L values of z. This is the node that can reverse its polarity by the smallest change in its weights, thus being denoted as the minimum-disturbance neuron, from which the procedures name is derived. A previously unset neuron is a neuron whose weights have not been set yet. b. Subsequently, one should change the weights of the latter neuron such that the bipolar output y of that unit changes randomly. c. The input set of vectors is propagated to the output once again. d. If the change in weight reduced the performance cost "e" of Step 2, then this change is accepted. Else, the original (earlier) weights are restored to that neuron.

Repeat
Step 3 for all layers except for the input layer. 5. For all neurons of the output layer: Apply Steps 3, 4 for a pair of neurons whose analog node-outputs z are closest to zero, etc. 6. For all neurons of the output layer: Apply Steps 3, 4 for a triplet of neurons whose analog node-outputs are closest to zero, etc. 7. Go to next vector up to the L'th vector. 8. Repeat for further combinations of L vectors till training is satisfactory.
The same can be repeated for quadruples of neurons, etc. However, this setting then becomes very lengthy and may therefore be unjustified. All weights are initially set to (different) low random values. The values of the weights can be positive or negative within some fixed range, say, between -1 and 1. The initial learning rate  should be between 1 and 20. For adequate convergence, the number of hidden layer neurons should be at least 3, preferably higher. It is preferable to use a bipolar rather than a binary configuration for the activation function [22] [23] .

Design of network
The applied network is shown in Figure (5). It is implemented with 3 layers, input (3 neurons), hidden (3 neurons), and output (26 neurons) that represent number of alphabetical English character from A to Z. The weights of the network are initially set randomly in the range {-1, 1}. The network is trained to output 1 in the correct position of the output vector and to fill the rest of the output vector with 0's. The element set to 1 indicates the classification of that input pattern.

Network training algorithm
The following are the basic steps for Training algorithm: [23] 1. Generate a training data set with 3 sets. . Each output is passed as input to the successive layer. 6. The final output is compared with the desired output and cumulative error for the 3 inputs is calculated. 7. If the input data not equal the desired output, weight is changed using: 8. weightnew = weightold + 2*constant*output(previous layer)*error 9. Weight(s) are updated and the new error is determined. 10. Weights are updated for various neurons until there is no error or the error is below a desired threshold. 11. Test data set is fed to the network with updated weights and the output (error) is obtained thereby determining the efficiency of the network.

Brief comparison with previous work
Most of the OCR deal with different factors such as seven moments which was used with so many scanners and their result was good enough to be obtained by commercial companies. But with the algorithm given in this paper the approach insert the neural network to decrease the recognition time in addition to the percentage of accuracy goes high.
Also the algorithm adopted some statistical factor for feature extraction of the printed character, Table (1) give brief comparison with some methods.

Method
Character used

Main features Percentage
A Stroke-Order Free Chinese Handwriting Input System Based on Relative Stroke Positions and Backpropagation Networks [24] Chinese symbols Back-Propagation Neural Network with 3 layers 90% Cursive handwritten word recognition using multiple segmentation determined by contour analysis [25] English characters Projection profile 70% On Optical Character Recognition Of Arabic Text [26] Arabic Characters Segmentation and matching of a candidate character shape to the prebuilt prototypes 97% An Efficient Fuzzy Method for Handwritten Character Recognition [27] English characters Fuzzy logic 95% A simple and efficient optical character recognition system for basic symbols in printed Kannada text [28] Kannada characters Seven Moments + RBF Neural Networks 82% Offline Handwriting Recognition using Genetic Algorithm [29] English characters Neural Networks + Genetic algorithms 71%.
Self Evolving Character Recognition using Genetic Operators [30] English characters Genetic Operators 79.23% Our approach English characters Madaline Neural Network 100%

Results and Discussion
The results are obtained by applying the algorithm on very closed characters. They show that a reasonable difference between the parameter adopted [Mean, Srd, and Var] can be seen clearly as in Table (  The results also show that the time needed to train such network on the parameters of Table (2) is very short (few micro second).

Conclusions
In this research the image has been painted using paint program. This image represents multiple numbers of text lines. Each line contains a number of characters with single font and size.
The character image after preprocessing is resized to be of dimensions 25 X 25. This image is converted into a vector and finding the Mean, STD, Var as feature extraction. Using this data as input nodes to the network, the Madaline neural network has been used to recognize the English character patterns by using Matlab®2008a. The Madaline Neural Network is able to train the character data to verify that each character image belongs to which character. The system is able to recognize the samples of characters in training with 26 character images of 100% training recognition rate and testing with 26 images of 100% testing recognition rate which can be seen clearly in Table (2).