Research on a Method of Character Recognition for Self-learning Errors

Due to the different writing habits, the handwritten numeral is difficult to identify. No matter what kind of network, the computer can not judge the output of the network. This reduces the recognition rate of the network. In order to improve the recognition rate, this paper proposes a method of character recognition for self-learning errors. Finally, on the matlab simulation platform, it is proved that the method proposed in this paper can improve the recognition accuracy.


Introduction
The optical character recognition technology includes the handwritten character recognition and printed characters recognition.As a part of handwritten character recognition, handwritten numeral recognition is a very important research direction.In recent years, with the development of computer and pattern recognition technology, character recognition technology has been widely used in postal code, financial amount and robot [1,2] , artificial assistant suit [3,4] .Although the classification of the classifier has been further enriched, but the researchers still can not find an algorithm to achieve the perfect effect.The artificial neural network with strong self-learning ability, self-adaptability, classification ability, fault-tolerance and fast recognition has attracted much attention, and it has been widely used in character recognition [5] .Common neural networks have BP neural network, CMAC neural network [6] .Neural networks can be combined with other intelligent controls, such as fuzzy neural networks [7] .In this paper, the probabilistic neural network is selected in the research of handwritten numeral recognition, and the recognition rate of characters is improved by using the method of re-recognition.By using the data of MNIST database, it is proved that the recognition rate can be improved when the self-learning error recognition method is used.

Fundamentals of Probabilistic Neural Networks
The theoretical basis of probabilistic neural networks is the Bayesian minimum risk criterion.The theoretical basis of probabilistic neural networks is the Bayesian minimum risk criterion.The basic principle of Bayesian classifier: Under the condition of prior probability, according to the prior probability of an object, the Bayesian formula can get its posterior probability.Finally, the class with the largest posterior probability is chosen as The class to which the object belongs [8] .
For ease of analysis, it is assumed that the classification to be made as c = c1 or c = c2.The prior probability is x=[x1,x2,...,xn]as the input vector and is classified according to Eq. (2)   ., p(c1|x)is the posteriori probability of class c1 in the case of x.According to the Bayesian formula, the posterior probability is equal to .
But the actual situation should also consider the loss and risk issues.The samples belonging to c1 may be assigned to c2, the samples belonging to c2 may be assigned to c1, this will cause losses, so the classification rules should be adjusted.
The adjusted Bayes decision rule becomes: . , Where R(c1|x) is the expected risk that the input is classified as c1.
It is found that the training of probabilistic neural network is simple and fast convergence, which can be fully realized in real-time processing, and the network has better performance.

Implementation of handwritten numeral recognition system
The handwritten numeral is generally provided in the form of picture in its practical application.The original image can be obtained by the input device, and then handwritten numeral recognition is started after pre-processing, character segmentation, feature extraction and classifier selection [9] .Fig. 1.The process of handwritten numeral recognition

Pretreatment
Pretreatment can improve the final recognition rate, so preprocessing is very important.Mainly for denoising, filtering and other operations.The digital image which be used in digital identification needs to be binarized.For image analysis, image segmentation is necessary [10] .The data used in this article from the MNIST database which is not pretreatment, but directly using the data from the database experiments.

Feature extraction
The importance of feature extraction [11] in image recognition is self-evident.Commonly used handwritten digital features are: structural features and statistical characteristics.In order to get better results, this paper uses the combination of the two characteristics of the image feature extraction, and obtain the 14-dimensional feature.There are eight structural features and six statistical characteristics in it.Constitute a feature vector with 14 values, each digital image is represented by the feature vector.P -191

The experimental results and improvements
The selected MNIST datasets [12] consisted of 60000 training samples and 10000 test samples.All the training samples were used to train the network, and then 10000 test samples were tested.Table 1 shows the recognition results at different network diffusion rates.In order to improve the recognition rate of the network, this paper proposes a recognition method of self-learning errors.

Fig. 3. The block diagram of improved system
The number of individual digits in the selected test samples is known, so we can create a matrix which has 1 row and 10000 columns.If the test samples is number 0, then it is marked as 1, and so on.As shown in Table 2 .In the experiment, the recognition rate obtained through the network training at different diffusion speed is shown in Table 1.When the diffusion speed of the network is 0.1, the recognition effect of the network is the best.In order to improve the recognition rate, the network can make up for the wrong knowledge by learning errors.Compare the results of the network with the expected output of the test samples, we can obtain a comparison result.If the comparison results are the same, the network output the results of recognition.If the results are different, the network is misidentification.
According to the desired output, we should establish a matrix which contains the number of false images and the corresponding expected output.Through statistical data, it is find that there are altogether 1949 images which are not recognized correctly.There are some of the values shown in Table 3. Select the eigenvectors of the error image and the corresponding expected output to train the error.The final recognition rate of the network is the sum of the two recognition results.Through the MATLAB simulation platform, we can get the final recognition rate which is shown in Table 4.  1 and Table 4, it can be seen that the method of using self-learning error can improve the recognition rate of the network.When the network diffusion speed is 0.1, the final recognition rate is 100%, which is nearly 20% higher than before.
In practice, the computer can't establish the desired output corresponding to the image to be recognized in advance.In order to solve the problem, this paper designs a method to set up the expected output corresponding to the image to be recognized by the Using Euclidean distance as a condition of similarity judgment has some misconceptions.In order to reduce the error, we use the network recognition result to modify the expected output to get the final expected output of the test samples.According to the above description, the network is simulated and the recognition rate of the network is 86.32%.The recognition rate is improved by 6% compared with the network before the improvement.

Conclusion
In this paper, the probabilistic neural network is deeply analyzed.The feasibility and validity of handwritten numeral recognition of probabilistic neural network are explored.An automatic learning error recognition method is designed to improve the probabilistic neural network.If the expected output of the test samples can be obtained, the recognition rate of the network can reach 100%.But the expected output of the test samples is not easy to obtain.The recognition rate of the network can increased by 6%.The establishment of the expected output still need to conduct in-depth study.

Fig. 2 .
Fig. 2. The probabilistic neural network in experiments It can be seen from Fig. 2 that the probabilistic neural network used in the experiment consists of 14 inputs.Since the training data used MNIST's data set, it contains 60000 training samples and 10000 test samples.In this paper, we don't select the 60000 training samples.All the training samples are directly used.Therefore, there are 60000 neurons in the hidden layer.As to identify the number zero to nine, so the Summation layer have ten neurons.The final result of classification is only one, the network output layer is a neuron.

Table 2 .
The expected output of test samples

Table 3 .
A partial misrecognition of the image number and the corresponding expected output result

Table 4 .
The improved network recognition rates at different diffusion speed Where a and b represent two vectors.Through the analysis, we can seen that the Euclidean distance between the two vectors is smaller, the similarity between the two vectors is larger.Therefore, the characteristic vectors of each test samples are calculated by Euclidean distance with the characteristic vectors of the 60000 training samples, and the number of the training samples with the smallest Euclidean distance from the test samples is calculated.