Multiclass Recognition of Offline Handwritten Devanagari Characters using CNN

The handwriting style of every writer consists of variations, skewness and slanting nature and therefore, it is a stimulating task to recognise these handwritten documents. This article presents a study on various methods available in literature for Devanagari handwritten character recognition and performs its implementation using Convolutional neural network (CNN). Available methods are studied on different parameters and a tabular comparison is also presented which concludes superiority of CNN model in character recognition task. The proposed CNN model results in well acceptable accuracy using dropout and stochastic gradient descent (SGD) optimizer.


Introduction
The recognition of scanned documents in term of characters/words is one of the major applications of pattern recognition known as optical character recognition (OCR). The scanned documents either printed or handwritten, involves pre-processing, segmentation, feature extraction followed by classification and post-processing steps in OCR. For scanned printed documents various tools are available in market with high recognition accuracy (Breuel, 2008). Some tools also perform well on neatly handwritten documents. But the precision of recognition system requires more accurate performance (Adak, 2019). Therefore, it is an active research area to improve results of recognition system. The various application of this research is in the field of handwritten notes/forms digitization, bank cheque processing, postal code recognition etc. In India, Devanagari is one of the most popular scripts in central and Northern part. In rural areas, most of the documentation work is done in Devanagari script. The national language of is also scripted in Devanagari. This paper focuses on the existing work introduced in literature for Devanagari handwritten character recognition. This script consists total 49 basic characters which comprises 13 vowels (swar) and 33 consonants (vyanjan) and 3 compound characters as mentioned in Table 1. Every character consists a horizontal line in its top, called as headline (or shirorekha) which added when the characters combine. The way of writing of this script is from left to right direction on paper. One of the fundamental characteristics of this script is that when the consonant is trailed by vowel, the shape of consonant is modified with the addition of modifier (or matra) in left, right, top or bottom position corresponding to that vowel. Sometimes, orthographic shaped character is also formed when consonant or vowel gets added with another consonant known as conjuncts (or Sanyuktakshar). The examples are shown in Table 1. Table 1. Devanagari characters The major contributions of this article are as: (1)  The rest of the paper is arranged in five sections. Section 2 presents the challenges of Devanagari handwriting recognition system. Section 3 discusses the different existing recognition schemes for offline handwritten Devanagari script at numeral and character level. Section 4 and Section 5 presents the experimental setup and results for Devanagari character recognition using CNN model. Section 6 concludes the article and presents some future aspects.

Challenges in Devanagari Handwriting Recognition
Lots of challenges are identified for offline Devanagari handwritten documents recognition. The variations in the writing style of every writer, skewness and slanting nature etc., causes difficulty in the segmentation of document from text line to word and then word to character. This script has large number of character sets due to modifiers as compared to non-Indic script like Latin which make recognition system more challenging. Poorly scanned document or image captured from low resolution camera, spots, broken strokes, blurring etc. added noise to the document. The historical documents recognition is also very challenging task because of the availability of lowquality manuscript, missing of standard character and unknown font size etc. Presence of many confusing strokes and compound/conjunct character added more challenges to develop a Devanagari handwriting recognition system.

Recognition of Offline Handwritten Devanagari Script
Many Indic languages are written in Devanagari script like Hindi, Marathi, Sanskrit, Nepali etc. Indic script consists more variations in shape of characters as compared to non-Indic scripts like Latin, Chinese, Korean and Japanese etc. which introduces more challenges to build a handwriting recognition system for Indic scripts (Bharath and Madhvanath, 2009). Kumar et al. (2018) discussed key issuess for character and numeral identification in Indic and non-Indic scripts. A review on different online and offline character recognition including Indic and non-Indic scripts are presented by Kaur et al. (2019). Singh et al. (2012) have discussed a review related to potential aspects of OCR in various fields. A review of methodologies applied in the Indian language scripts recognition is described by Pal and Chaudhuri (2004) and Pal et al. (2012). Prasad (2014) also discussed an in-depth literature survey of Indic script recognition systems. Nowadays, convolution neural network (CNN) shows its importance in various fields and Indic script recognition is one of these (Mehrotra et al., 2013;Acharya et al., 2015;Maitra et al., 2015;Singh et al., 2016). The details of numeral and character recognition are presented in following subsections.

Devanagari Numeral Recognition
In literature many feature extraction and classification techniques are used. Density, moment of right, left, upper and lower profile and descriptive component etc. are the main part of statistical features used by Bajaj et al. (2002). Elnagar and Harous (2003) worked on structural (or shape) features i.e., strokes and cavity features of thinned cursive handwritten Hindi numerals. Ramteke and Mehrotra (2006) have been worked on features which are based on moment, image partition, principle component analysis (PCA), correlation coefficient and perturbed moments. Garain et al. (2006) explores the strength of a clonal selection algorithm for 10-class classification problem of handwritten numerals. Sharma et al. (2006) divided the numeral (also applied for character) into blocks and then calculated the chain code histogram in each block. They used quadratic classifier for classification of obtained chain code features.  represent the numerals in the form of modifying exponential relationship functions processed to the fuzzy sets. Box approach is used to obtain normalized distance features and then fitted to fuzzy set classifier. Hanmandlu et al. (2007b) also worked on box approach extracted features of handwritten Hindi numerals followed by bacterial foraging method. Patil and Sontakke (2007) Arora et al. (2007) used structural properties of character for recognition purpose. The main focus of this article was headline (or shirorekha), spine and intersection properties of characters. The structural features are then treated by feed forward neural network for categorization. Hanmandlu et al. (2007a) defined feature extraction process using vector distance method and for classification of characters, they selected fuzzy set classifier. Pal et al. (2007b) discussed gradient filter and gaussian filter for feature extraction process. They choose quadratic classifier in their work for further classification. Bhattacharya and Chaudhuri (2008) presented a multistage cascaded scheme which focuses on wavelet based multi-resolution features and multilayer perceptron classifiers. Deshpande et al. (2008) acquired features from chain code rule and then regular expressions, minimum edit distance method (MED) etc. are used for character recognition scheme. Pal et al. (2008) described two set of features, first is directional information and second is curvature-based property. The first is acquired from the arc tangent of the gradient and second is guided by gradient information. They used support vector machines (SVM) and modified quadratic discriminant function (MQDF) together for classification task. Arora et al. (2009) described different feature extraction and recognition algorithms in their work. In preliminary recognition step, the extracted features are based on chain code histogram, four side views, shadow etc. and then forwarded to MLP classifier. In final step, weighted majority method is used to combine the results of all MLP's. Kumar (2009) discussed five feature extraction approaches on Devanagari handwritten dataset and reported good performance of SVM classifier of gradient features. Mane and Ragha (2009) described elastic image matching (EM) procedure correlated to eigen-deformation. The calculation of deformation in offline handwritten character comprises category-dependent tendencies (also known as eigen-deformations) is a measure of feature set which is further examined for elastic matching-based character recognition. Pal et al. (2009b) discussed twelve different classifiers and four sets of feature obtained from curvature and gradient information of image. They reported that the best recognition results are achieved by using Mirror Image Learning (MIL) classifier. Arora et al. (2010) proposed character recognition in double stages, first using neural network (NN) and second using minimum edit distance (MED). In the first stage two MLPs are used, one is applied on shadow features of character and other on chain code histogram features. Top three combined results of MLPs are considered using weighted majority rule to compute relative difference values. These relative differences are used in second stage to divide the character in two sets as distinct shapes and disordered characters (or alike shapes) and classified by using MLP and MED method respectively.

Devanagari Character Recognition
The comparative analysis of methods discussed above for offline Devanagari handwritten document recognition is tabulated in Table 2 with description of various parameters.
Below table summarizes various feature extraction and classification methods along with the accuracies obtained on numeral/character recognition process for Devanagari script. The feature extraction algorithm basically finds the relevant details about the input text image and map it to the corresponding label in training process like shape, position etc. Variable length dataset has been used by researchers. In this work, a deep neural network is proposed consisting four convolutional layers; mix with max-pool and nonlinearity 'RELU'. Dropout is added with regular CNN architecture in order to avoid over fitting. We selected SGD optimizer to find the values of parameters and reduce the cost of function.  Table 2 continued… Arora et al. (2007) Character (50000) Structural Feed forward neural network 89.12 Hanmandlu et al. (2007a) Character (4750) Vector distance Fuzzy sets 90.65 Pal et al. (2007b) Character (

Experimental Setup
The experiments are done on DHCD which consists of 78,200 samples for training and 13,800 samples for testing. Many experiments are performed by varying parameters using sequential model of keras and choosing the one with best validation performance. The proposed CNN model configuration is presented in Figure 1 and summarized in Table 3. The sample image (matrix with pixel values) is entered into first convolution layer. From the top left, the kernel (here, kernel size is 3x3) moves along the image by unit 1 as depending on stride value. The role of kernel is to multiply its values with the pixel values of image and then these multiplications are added to get one value. Similarly, all the operations are done after passing kernel across all positions in image and at last a matrix is obtained, having size smaller than input matrix. The network comprises four convolutional layers mixed with nonlinearity as Rectified linear unit (RELU) and max pooling layers. In the network the output of first convolutional layer becomes the input of second layer and this happens for further layers. The property of activation function, RELU which is added after each convolutional layer is as: The pooling layer performs down sampling operation on width and height of image in order to reduce the image volume. If some features (e.g. boundaries, line, curve etc.) have been identified in previous layer, then this layer compress image to less detailed values. In deep learning model multiple connections and trainable parameters exists in number of non-linear hidden layers which make the model inclined to over fitting. Therefore, dropout of 0.2 is used to prevent model from overfitting phenomenon. Flatten performs the input role for further fully connected layers. Two fully connected layers are added with the output shape 128, 64 and activation as RELU function. The last fully connected layer is attached with dimension equal to number of output classes (here, 46) with softmax activation function. The softmax function is also known as normalized exponential function which takes k real numbers vector as input and normalizes it to k probabilities proportional to the exponentials of input numbers. It maps the non-normalized output of CNN model to a probability distribution over predicted output classes. The model architecture is implemented on DHCD for multiclass classification problem. For the compilation of model, stochastic gradient descent (SGD) optimizer is selected for the standard back-propagation and feed-forward network. The categorical cross-entropy loss function (l) and accuracy metrics is used to check the performance of model. The stochastic gradient descent is an iterative optimization algorithm to reduce the loss function which measures the difference between the actual output and predicted output from the model and updates the weights (w) for the neurons for all iteration as shown below: where,  is learning rate and ∂l/∂ω is the gradient.
The network is trained for 50 epochs with mini-batch size 32. The advantage of CNN model is that it extracts features automatically from sample images which are different from conventional handcrafted features extraction method and it also predict classification probabilities of images to total number of output classes using softmax activation function. CNN is superior for image classification task than traditional machine learning methods for large database.

Results
Three cases of dataset distribution are presented in Table 4 with validation and test accuracies. It is observed that Case 2 obtained highest validation and test accuracies. For Case 2, the training and validation accuracy and loss curves are shown in Figure 2. To check the superiority of CNN for image classification we check another method. We extract features from above proposed pretrained CNN model at second last layer and train a Random forest (RF) classifier supervisely. Then check the performance of trained RF classifier for test samples and result is presented in Table 5. The RF classifier performance is less than CNN which  The validation of proposed CNN model is also validated on MNIST dataset and a well acceptable accuracy of 99.28% is obtained. Result of proposed CNN model is compared with the CNN model presented by Acharya et al. (2015) on the same dataset (DHCD). Results are tabulated in Table 6 and Table 7.

Conclusion
It is observed that various methods have been developed in literature for Devanagari handwritten character recognition and out of all CNN perform best for both character and numeral recognition together for large dataset. This article presents a well performing CNN model for multiclass classification task on Devanagari handwritten character dataset (DHCD). The discussed work is based on fair quality dataset. For degraded documents the performance of recognition system requires further improvement. For this, new most relevant feature extraction and classification method can be explored.