Japanese Hiragana Handwriting Pattern Recognition Using Template Matching Correlation Method

- Hiragana is one of the traditional Japanese letters used to translate native Japanese words. The introduction of an object requires a learning process, which is obtained through the characteristic in the form of unique features on similar objects, but manually it is quite difficult to distinguish these letters. This writing explains the discussion system to differentiate between hiragana letters starting from preprocess namely grayscale and threshold, then segmenting and normalization, while image classification uses the Template Matching Correlation method. The results of tests carried out assessing the test rate of around 76% using the Matching Template Correlation method. While the remaining 14% indicates that the object identified does not match the intended results.


I. INTRODUCTION
The application of the five senses to computers is increasingly being carried out in current technological research.The system created on a computer can recognize an object by identifying the functions, characteristics and characteristics of the object [1].The sample object has previously been processed into a reference image, then the other images are trained using the appropriate algorithm.This process aims if there are things that cannot be identified by humans, it can be assisted by a computer, for example, identification of fingerprints, handwriting patterns, faces and so on.
Japanese culture is currently in great demand in Indonesia, whether it is presented through manga, anime, film, music, and so on, therefore many Indonesian peoples are interested to learning Japanese.Japanese has a close historical relationship with surrounding countries, LW ¶V China [2], but because this language has different gramatical rules, different types of letters, ways of reading and writing, often students find it difficult and lazy to learn them [3][4][5].
Hiragana letters and katakana letters are both made by Japanese people themselves.The way of writing hiragana letters represents the designation of syllables, to write original Japanese words (not absorption), as furigana which shows how to read a kanji and as an alternative to other letters if you don't memorize the kanji [6][7].This characters is also used to write Japanese original vocabulary, consists of 104 letters, namely 46 main letters, 25 letters that teng teng ³ DQG maru (o), 33 which uses little ya-yu-yo combination.
In general, pattern recognition functions is to classify or describe something based on quantitative measurements of the main characteristics or properties of an object [8][9].The pattern or shape used to make or produce a part of something, especially if something that is caused has some kind of basic pattern that can be shown or seen and can be said to show a pattern.Formal pattern recognition is also a process that receives patterns or signals based on measurement results which are then classified into one or more specific categories or classes by going through the process of feature extraction and distance classification.
The way feature extraction works is by taking or extracting unique values from an object.This feature extraction aims to sharpen pattern differences so that it will facilitate the separation of class categories in the classification process [10].The distance classification process functions to classify the results of feature extraction then identification is obtained by matching the test image against the reference image that has been previously trained and stored in the database, in other words this process is a process of sharpening data to make it more easy to use in subsequent processes [11].
Some researchers have used the template matching correlation algorithm to recognize several types of letters.The journal created by Suryo Hartanto to recognize fonts, the problem that arises in the process of recognizing computer letters is how a recognition technique can recognize various types of letters with different sizes, thickness and shapes.Distance classification using template matching correlation obtained high accuracy results, namely around 92.90%, even though the type and size of the letters used as input were different from the template [12].Research on Hijaiyah letter pattern recognition by Fathurrahman using backpropagation neural networks produces the best accuracy of 100% [13].
Judging from several previous studies, the aim of this study was to test the Optical Character Recognition (OCR) algorithm, namely template matching correlation [14] in handwriting pattern recognition, for example Japanese hiragana letters, thus helping students recognize handwritten letters with unusual shapes and difficult to recognize.This method is proven to be able to identify letters, so that students can practice writing and recognizing these letters more easily.

A. System Requirements Analysis
Analysis of system requirements is needed to support performance to help run the system creation process.This analysis is divided into two parts, namely the analysis of functional and non-functional requirements.Analysis of functional requirements is an analysis of what features will be applied to the system.These features include, the system can be displayed properly, the system can enter the selected image into the system, the system can change the RGB image to a grayscale image, the system can identify patterns in the image along with a description of the results of the test image's similarity to the reference image.
Analysis of non-functional requirements is a need that is carried out to determine the requirements and feasibility specifications of the system.Requirements specification involves hardware (hardware) and software requirements (software).The hardware specifications used are a computer with an AMD APU A-9 9400 processor up to 3.2 Ghz, 4 GB DDR4 RAM memory and a 2GB Dedicated VRAM Radeon R5 M430 graphics card.While the software used to identify letters is the Microsoft Windows 10 operating system and the MATLAB R2017a software and the images examined by the system are 5 basic hiragana fonts.Each typeface contains 50 handwritten image data with the same size, namely 540x540 pixels.The next step is binaryization, which is the process of converting an image into a binary image.Binary image has two grayscale values, namely black and white, where the pixel that shows white has a value of 0 and black has a value of 1, therefore the background in the image is white and objects or letter characters will be recognized by black.The segmentation process which functions to separate the observation area (region) on each character that will be processed, in other words, this process classifies object pixels into areas that represent an object.The boundary between the object and the background that has been processed in the segmentation stage was previously considered as a series of decisions to get object lines such as moving straight, turning left, turning right or other unique lines.This stage serves to sharpen the pattern against the background so that it makes it easier for the system to separate object categories in the classification process using a chain code.

C. Chain Code Algorithm
The chain code algorithm used is one form to describe a morphological structure of the object.The chain code works in binary or black and white format and uses 7 directions to calculate the perimeter and area as shown in Fig. 2.

D. Template Matching Correlation Method
The distance classification stage uses the template matching correlation method, which is a technique in digital image processing that functions to match each part of an image with the image that is the reference (template).The input image is compared with the template image in the database, then the similarity is searched using a certain rule.Image matching that produces a high level of similarity / similarity determines that an image is recognized as one of the template images.The similarity between the two image matrices can be calculated by calculating the correlation value.The correlation value for distance r is obtained using (2).
Which T § Ü and T § Ý formulated by ( 3) and ( 4). (3) Information : r = correlation between 2 matrix X ik = the k pixel value in the matrix i X jk = the k pixel value in the matrix j X i = average pixel value matrix i X j = average pixel value matrix j n = represents the number of pixels in a matrix For example, we know that two feature vectors are as follows: n = 30 X i = 699 X j = 282 X ik .X jk = 6861 The distance value of the matching correlation template from the x and y vectors is :

III. RESULTS AND DISCUSSION
The letter recognition process is carried out based on the results of the previous distance classification while still displaying the image and text reading results from the processed image.This is because basically the identification system can be tested after the system is trained first [15].Testing is done by entering a new image that has not been recognized or that has never been used before to be identified with the reference image and the results of the level of similarity are seen.The reference image and test image for each type of font are prepared in the same number for each hiragana typeface, namely the reference image of 20 images and the test image of 20 images.The following image will be used in the system as illustrated in Table I.
The initial processing stage by entering the image as input into the hiragana letter image identification process is carried out by taking a new image with the same size as the reference image that has been stored on the computer.The next stage by converting an RGB image to grayscale or a gray image is shown as in Fig. 3.
The next step is to carry out the preprocessing process including grayscalling and binaryization, followed by the extraction of image features using a chain code by mapping the array according to the cardinal directions.The array mapping process begins by tracing the pixels in the upper left corner followed by the next pixel on the right and so on.The following is the chain code implementation in an example of an E image with a size of 7x7 pixels shown in Fig. 4. Based on the cardinal directions, the search process continues to the right until a pixel is found with a value of 1 then from this pixel to another pixel that is worth 1.Pixels with a value of 1 if the card direction used is 0, which is a straight right, then the system will save to the database the value of the direction.cardinal points 0 becomes 1 time.If the cardinal direction used is 5, which is the lower left, the system will save to the database value of the cardinal direction 5 to 1 time, and so on.The feature extraction value processed by the Matlab programming will be summed from each of the cardinal directions used to trace the pixel with a value of 1 as shown in Table II.This process will display the identification of the similarities between the two images, namely the test image and the template image, then the system will display the image detection results recognized as what letter by the system.Here is one of the results of the image identification test on the letter E is shown as in Fig. 5.

Number of Images
The test results on image E prove that the system can work properly which the system also confirms that the input image is an image with the letter E. If the system displays that the result is other than the letter E, this proves that the system can run even though the results are not as they should be.In the following, the test results are calculated on all 200 test images and the conclusions are obtained as in Table IV.IV, the test results with 40 images for each letter, 20 images as reference images and 20 images as testing.In the image of letter A, 13 images are recognized as A correctly, while in the rest of the images 2 images are recognized as letter O and each image is recognized as letters U, E, Ka, Ki, Ko.Image of letter I, 15 images are recognized correctly, while 3 images are recognized as letter Ke and each image is recognized as letter U and Ku.The results of the image of the letter U, 18 images are recognized correctly and 1 image each is recognized as letters I, Ka and Ko.The letter E, 15 images are recognized correctly, 3 letters are recognized as the letter U, 1 letter is recognized as the letter A and Ka.Image O, 13 images are recognized correctly, 4 images are recognized as letter A, 2 images are recognized as E, and 1 image is recognized as letter Ki.
Image of letter Ka, 17 images are recognized correctly and each image is recognized as letter I, O, Ke.Image of Ki letter, 14 images are recognized correctly, 3 images are recognized as letter O, each image is recognized as letter A, Ke, Ko.Image of Ku letter, 18 images are recognized correctly, and 1 image is recognized as letters I and Ka.Image of letter Ke, 12 images are recognized correctly, 4 images are recognized as letter I, and 1 each is recognized as letter A, Ka, Ki and Ku, while in letter Ko, 17 images are recognized correctly, 2 images are recognized as letter E and 1 image is recognized as letter A. The success results are about 76% with the resulting recognition success rate is quite high even though the size and type of font used as input are different from the reference image.
The comparison of the success of pattern recognition readings on the image is calculated based on the percentage of the number of successful readings compared to the number of experiments.The success rate of reading the image is 76% with 152 successful readings out of 200 attempts.This result is relevant as the wood image identification system in the study [15], with an accuracy of about 77.5%.Off-line testing uses a template matching algorithm in the study [14] which results in an accuracy of 85% and research [12] by Suryo Hartanto entitled Optical Character Recognition using the Template Matching Correlation algorithm got a better total accuracy rate of 92.90%.The difference in the success rate in this study is influenced by the large amount of data, namely a total of 400 image data for each of the 10 fonts.The input image is adjusted to minimize noise and is resized to match the reference image.This can happen because basically the template matching correlation algorithm is very dependent on the number of reference images, the more images used, the greater success rate of the system to identify a letter and match it to the reference image.
Table IV shows the test parameters for all letters, both correctly and incorrectly recognized, with the reading speed that is relatively the same, namely an average of 3 seconds, but if the image reading is not suitable, the average time will be accumulated until the reading is appropriate.The process of reading the kanji image image fast enough will not be a burden or affect the system to get the right or wrong results as is the case with the purpose of implementing a parking system with vehicle number plate image recognition [16].
Tests are carried out on the system as a whole to assess whether the system is in accordance with functional requirements.The results of the research for hiragana letter pattern recognition are in accordance with previous studies by testing shown in the activity diagrams in Fig. 5 and Table IV, this proves that the functional requirements have been met by the system.
IV. CONCLUSION The pattern recognition system using chain code and template matching correlation has been successfully applied to Japanese hiragana letters with an accuracy of 76%.This proves that the method used can identify letters, so that Japanese students can practice writing and recognizing these letters more easily.Suggestions for the prospect of the next study, the system requires more reference image data and test data because basically the method used for image processing is very dependent on the number of images used, therefore the system will have a database that is better at recognizing a character from the image test.The image used should be primary data and the image is clean from background noise so that it make more easy for the system to process the original image into a binary image and it will also be easier for the next process.