Yoruba Handwritten Character Recognition using Freeman Chain Code and K-Nearest Neighbor Classifier

This work presents a recognition system for Offline Yoruba characters recognition using Freeman chain code and K-Nearest Neighbor (KNN). Most of the Latin word recognition and character recognition have used k-nearest neighbor classifier and other classification algorithms. Research tends to explore the same recognition capability on Yoruba characters recognition. Data were collected from adult indigenous writers and the scanned images were subjected to some level of preprocessing to enhance the quality of the digitized images. Freeman chain code was used to extract the features of THE digitized images and KNN was used to classify the characters based on feature space. The performance of the KNN was compared with other classification algorithms that used Support Vector Machine (SVM) and Bayes classifier for recognition of Yoruba characters. It was observed that the recognition accuracy of the KNN classification algorithm and the Freeman chain code is 87.7%, which outperformed other classifiers used on


I. INTRODUCTION
Handwritten character recognition has been an extensive area of research in the last three decades [1], [2].The research in character recognition is popular for its various application potentials in office automation, cheque verification in banks, post offices, and a large variety of business and data entry applications [3], [4].Other applications involve reading aid for the blind, library automation, language processing, and multimedia design [5]- [8].
Geometrical growth in computational power has enabled the implementation of the current character recognition methodologies [2], [9]- [12].This also creates an increasing demand on many emerging application domains of character recognition which require more advanced methodologies [13]- [17].The process of optical character recognition (OCR) includes scanning of the text character-by-character, analysis of scanned images, and the translation of the character image into character codes, commonly used in data processing [6], [14].
There are two major problem domains in handwriting recognition.These are online and offline method [10].The online handwriting recognition involves the automatic conversion of the captured text as it is written on a special digitizer or Personal Digital Assistant (PDA), where a sensor picks up the pen-tip movements as well as pen-up/pen-down switching [11]- [14].The captured handwritten text is converted into letter codes which are used for the text-processing applications.In the offline technique, the system recognizes the fixed static shape of the character [3], while online character recognition instead recognizes the dynamic motion during handwriting.In offline handwriting recognition, information captured are available on paper, which is digitized through the use of scanner [17]- [21].
Yoruba is a native language spoken by over 40 million people in Nigeria and its diaspora [22].Yoruba alphabets are tonal, which makes its recognition more difficult than that of printed English alphabet [5], [20], [23].The Yoruba orthography is represented in Table 1.
Despite, the intensive research on Latin languages, only a few works have been done in the area of indigenous languages [6], [8], [23].Researchers in handwriting recognition have been considering recognition in different languages as a dissimilar *) Correspondence author (Jumoke Falilat Ajao) Email: jumoke.ajao@kwasu.edu.ngproblem.Each language has its own character sets and the language features which make it difficult to develop a general algorithm that works for all the languages.Many researchers have used Freeman chain code for extracting features of characters or numerals due to its simplicity and the capability for small memory requirement.K-Nearest Neighbors (KNN) was also used because of its simplicity, reduced training time and good performance.Few works that have been done on Yoruba character recognition uses Bayes and SVM.There is a need to test the simplicity of Freeman chain code and the less computational complexity training time of KNN on Yoruba character and to determine its efficiency over other classifiers.This work presents an approach to recognize Yoruba handwritten characters based on Freeman chain code and KNN classifier.

II. METHODOLOGY
The method of the recognition system of Yoruba character includes data acquisition, preprocessing, feature extraction, classification and recognition (Figure 1).The framework of the recognition system began with the collection of handwritten character from various Akomolede indigenous Yoruba writers of a secondary school.The character images were captured using an HP Scanner with 300 dpi resolution.The handwritten characters were scanned in JPEG format, which was converted into Portable Network Graphics (PNG) format and stored in a database using MySQL.
Samples of Yoruba handwritten characters were collected from the "Akomolede" Cultural Group of Lower Niger River Basin Staff College, Ilorin.A guided template was used to capture various handwriting from the targeted subject, the "Akomolede" Cultural Group are secondary students from various classes (from JSS One to SS Three).They learn and teach other students Yoruba Cultures and Practices of Yoruba in the college.Majority of these students are Yoruba by tribes, a few of them are from other tribes.The handwritten images were scanned using an HP Scanner with 300 dpi resolution (Figure 2).The scanned images were converted into Portable Network Graphics (PNG) format.The scanned handwritten characters were stored in a database.
The series of processes operation was carried out on the captured and digitized image for the enhancement of the digitized image for further processing.In the preprocessing stage, the system carried out data cleaning tasks which include image cropping, noise removal, greyscale conversion, binarization, and The images were transformed to their discrete form by assigning numbers to the light intensity of each of the points of the discretized images which ranges between 0 and 255.The original RGB image was converted to grayscale and the binarized images were converted to black and white image.Otsu method was used to binarize the image [25].The binarized image is shown in Figure 3. Digitized image was resized in order to have a minimal input to the KNN to reduce the size of the image to an appreciable size.Cropping removes the unwanted blank space which just an unwanted noise from the four sides of the image in order to have appropriate dimensions.Cropping makes the image to be centrally aligned using the CorelDraw 12 software (Figure 4).
The image was subjected to normalization by converting the various dimensions of images into a fixed dimension to remove handwriting variation and obtain standardized data.The input data was transformed into a reduced representation set of features using a chain code algorithm [26].The features set extracted the relevant information from the digitized data in order to perform the desired task using this reduced representation instead of the full-size input.The Freeman chain code was used to extract the features of the characters and performed on raw data prior to applying the KNN algorithm on the transformed data in feature space.
In handwritten Yoruba character recognition system, an 8-connected chain code feature extraction techniques was used, which represents each of the characters with 8-connected chain codes.The underdot and the diacritic sign applied to these words make them have different meanings.In this case, it is very difficult to extract different objects or components which constitute unique features for each of the individual Yoruba character.The feature extraction techniques extract different profiles such as the character, the starting point, the loop, the curvature, the dot, and the diacritic sign.The extracted features were represented in a feature vector matrix as input to the KNN classifier algorithm.
The application of chain code is illustrated with samples of Yoruba character E with grave accent and under dot and S with under dot (Figure 5).In Yoruba character, the symbol in Figure 5(a) represents vertical line attached to triple linearity with under dot and grave diacritic sign.Figure 5(b) represents dual roundness with dual linearity.Generated Freeman chain codes for the samples of Yoruba characters are shown in Figure 6.
The KNN algorithm was used for the estimation and classification of Yoruba character.According to the KNN estimation technique, Algorithm 1 is performed [14], [16].x represents the point extracted by the  Freeman chain code, d is the Euclidean distance, K is the starting point, and V(x) is the volume of the point that makes up the character, that is V ( x)=πpp 2 in the 2D space of the handwritten images.The Euclidean distance is employed and the k-furthest neighbor and x is e.After computing the K-Nearest estimation density function, the KNN classification rule for the recognition of Yoruba character was carried out using Algorithm 2. closest neighbor using Euclidean distance measure parameter k, where the starting point that can be choosing from any arbitrary point, which could be even or odd depending on the contour of the character.
2: From the K-closest neighbors, identify number ki of that class that belong to class ω i , ∑ i=1 c k i =k 3: Assign x class ϖ i , k i ≻k j , where x is assigned to the class where majority of k-closest neighbor belong.

III. RESULTS AND DISCUSSIONS
In order to validate the developed Yoruba handwritten character recognition system, the handwritten character images were selected from the created Yoruba database, and it was tested on the recognition module.The test data were subjected to preprocessing.Feature extraction of the preprocessed images was carried out and compared with the trained features vector.If the two feature sets are similar, it looks up for the appropriate character in the look-up table and display the correspondence digital equivalent of the handwritten character if found, or brings a False Negative result if not found.The performance of KNN classifier was compared with other results that use different classifiers using different databases.Evaluation metrics was carried out using the recognition rate.
The recognition of Yoruba handwritten character was performed on the Yoruba handwritten database created by the Author.It recognized 37 Yoruba alphabets distinguished based on diacritical marks with 61 captured samples each.Table 2 reveals the testing analysis of all Yoruba alphabets captured for this research work.Each character contains 61 samples which were collected and digitized.Out of the 60 samples collected, one of the samples was used for training the system while the remaining 60 samples were used for testing the system.All samples for training and testing are converted to Portable Network Graphics (.PNG) to enhance the better performance of the system.The results reveal that the highest The recognition rate of these alphabets (i, l, í, and r) are very low.Naturally, the alphabets that seem to be simple and easy to identify are poorly recognized by the system.The recognition rate of the system was evaluated by calculating the mean of the 37 handwritten characters tested.The average cognition rate of the designed system was 87.7%.
The result was compared with other classification algorithms used on Yoruba characters and English letters recognition (Table 3).It was observed that the proposed method outperformed Yoruba character recognition using SVM using 600 handwritten images for Yoruba alphabet [20].The performance is also better than KNN used in English letter recognition [14] and closed to ones that used combined classifiers of Bayes and decision tree on 6 Yoruba uppercase characters only [5].This work has achieved the simplicity of KNN with less training time against the complexity involved in multiple classifiers.It is also comparable with [21] that able to recognize Yoruba word, instead of one character, accurately using Hidden Markov Model (HMM).

IV. CONCLUSION
The KNN and Freeman chain code methods for Yoruba character recognition outperformed the one using SVM by achieving average recognition accuracy of 87.7% for 37 Yoruba alphabets distinguished based on diacritical marks.Each alphabet used 61 samples.

Algorithm 1 . 2 3: 5 :Algorithm 2 .
KNN estimation process Consider a set of N points, x 1 , x 2 , x 3 , x 4 ... .x n ∈ z i , that emanates from the Freeman chain code.1: Choose a value k 2: Find the distance between x and all training points x i ,i =1,2,3 . ... . .N , extracted using Freeman chain code, where Euclidean distance is computed as d = √ x i+1 −x i Find the k-nearest point to x 4: Compute the volume V(x) in which the k-nearest neighbors lie.Compute the density estimate by P ( x )= k NV ( x ) KNN classification process of Yoruba character Given c classes with: ω i , i=1,2... .c , a point x∈Z I using Euclidean distance and training point x i for i=1.. .N in the 2D space.1: Select among the N training points, Search for k

Table 2 .
Results analysis of training and testing of Yoruba character

Table 3 .
Accuracy comparison among methods