Elsevier

Pattern Recognition

Volume 65, May 2017, Pages 251-264
Pattern Recognition

Improving handwritten Chinese text recognition using neural network language models and convolutional neural network shape models

https://doi.org/10.1016/j.patcog.2016.12.026Get rights and content

Highlights

  • We evaluate comprehensively neural network language models (NNLMs) and hybrid NNLMs in handwritten Chinese text recognition.

  • We apply CNNs to over-segmentation and geometric context modeling in addition to character recognition.

  • By training NNLMs on large corpus and integrating CNN shape models, we achieve new state-of-the-art performance on standard datasets.

  • We analyze the upper bound of performance of the text recognition system by calculating the lattice error rate.

Abstract

Handwritten Chinese text recognition based on over-segmentation and path search integrating multiple contexts has been demonstrated successful, wherein the language model (LM) and character shape models play important roles. Although back-off N-gram LMs (BLMs) have been used dominantly for decades, they suffer from the data sparseness problem, especially for high-order LMs. Recently, neural network LMs (NNLMs) have been applied to handwriting recognition with superiority to BLMs. With the aim of improving Chinese handwriting recognition, this paper evaluates the effects of two types of character-level NNLMs, namely, feedforward neural network LMs (FNNLMs) and recurrent neural network LMs (RNNLMs). Both FNNLMs and RNNLMs are also combined with BLMs to construct hybrid LMs. For fair comparison with BLMs and a state-of-the-art system, we evaluate in a system with the same character over-segmentation and classification techniques as before, and compare various LMs using a small text corpus used before. Experimental results on the Chinese handwriting database CASIA-HWDB validate that NNLMs improve the recognition performance, and hybrid RNNLMs outperform the other LMs. To report a new benchmark, we also evaluate selected LMs on a large corpus, and replace the baseline character classifier, over-segmentation, and geometric context models with convolutional neural network (CNN) based models. The performance on both the CASIA-HWDB and the ICDAR-2013 competition dataset are improved significantly. On the CASIA-HWDB test set, the character-level accurate rate (AR) and correct rate (CR) achieve 95.88% and 95.95%, respectively.

Introduction

For the past 40 years, the field of handwritten Chinese text recognition (HCTR) has observed tremendous progresses [1], [2]. However, it remains a challenging problem due to the diversity of writing styles, the character segmentation difficulty, large character set and unconstrained language domain. The recognition approach based on over-segmentation by integrating character classifier, geometric and linguistic context models has been demonstrated successful in handwritten text recognition [3], among which both the linguistic context model (i.e., language model) and the character shape models are of great importance.

Statistical language models, which give the prior probability of a sequence of characters or words, play an important role in many applications such as character and speech recognition, machine translation and information retrieval, etc. Although back-off N-gram language models (BLMs) were proposed over 20 years ago [4], [5] and have been used in handwritten text recognition for more than 10 years, they are still considered as a favorable choice and have performed superiorly for decades. BLMs have been widely applied in a vast variety of text recognition systems [3], [6], [7], [8], [9], [10], [11], [12], [13], and have boosted the recognition performance substantially.

Generally, higher order language models can capture longer context patterns so as to estimate the sequence probability more accurately. Carpenter [14] found that the performance of character N-gram can be significantly improved until 8-gram, given sufficient training samples. However, traditional BLMs suffer from the data sparseness problem, as the number of parameters increases exponentially with the length of the context (i.e., the curse of dimensionality), preventing these models from estimating context stably. Recently, a new type of language model called neural network language model (NNLM) has been proposed to address the data sparseness based on a continuous representation of words [15], and achieves great perplexity reduction compared with BLMs. Since then, NNLMs have been successfully used in speech recognition [16], [17], machine translation [18], [19], and handwriting recognition [20], [21]. Meanwhile, many extensions of NNLMs and related algorithms have been proposed, with the aim of improving the model performance [17], [22], [23], [24], [25] or reducing the time complexity [26], [27], [28], [29]. Particularly, the previous work has focused on either feedforward NNLMs (FNNLMs) [15], [16], [18], [19], [20], [21] or recurrent NNLMs (RNNLMs) [17], [28], [30], [31]. Nevertheless, to the best of our knowledge, except for our previous work on FNNLMs [21], there is no systematic evaluation of NNLMs for over-segmentation based text recognition systems.

Apart from the language model, character classifier [32], over-segmentation [33], [21], [34] and geometric context models [35] (called shape models generally in this paper) are also important to the text recognition performance. CNN based classifiers for Chinese characters have reported superior performance in ICDAR 2013 competition [36], where CNNs reported much higher accuracies than traditional classifiers. Using CNNs, the handwriting recognition community has reported many useful and important achievements [37], [38], [39] to improve the recognition accuracy. Recently, by integrating the traditional normalization-cooperated direction-decomposed feature map (directMap) with the deep CNN, Zhang et al. [40] obtained new highest accuracies for both online and offline sessions on the ICDAR-2013 competition database. For over-segmentation, there have been many algorithms to deal with touching characters, but most of them are based on heuristic rules [33], [34], [41], [42], which make it very difficult to generalize from one application to another. A few learning based techniques have been explored [10], [43], [44], however, only the method in [44] was successfully applied in HCTR, and is only suitable for the single-touching situation. As for the geometric context models, although many researchers proved it can improve the recognition accuracy [3], [9], [35], [45], [46], there has been no work using deep learning based geometric models.

In this paper, we evaluate the effects of two types of character-level NNLMs, namely, FNNLMs and RNNLMs, with the aim of improving Chinese handwriting recognition. Both FNNLMs and RNNLMs are also combined with BLMs to construct hybrid LMs. For fair comparison with BLMs and a state-of-the-art system, we evaluate in a system with the same character over-segmentation and classification techniques as before, and compare various LMs using a small text corpus that were used by a previous system. In experiments on the Chinese handwriting database CASIA-HWDB, the comparison of a number of variations of LMs shows that the NNLMs improve the recognition performance, and hybrid RNNLMs outperform the other LMs. To provide a new benchmark, we then evaluate selected LMs on a large corpus. Also, we replace all the baseline character classifier, over-segmentation algorithm, and geometric context models with CNN-based models in the system for further improving the accuracy of HCTR. By doing these, the performance on both the CASIA-HWDB and the ICDAR-2013 competition dataset are improved significantly. Specifically, on the CASIA-HWDB dataset, the character-level accurate rate (AR) and correct rate (CR) achieve 95.88% and 95.95%, respectively, compared to the previous results of 90.75% AR and 91.39% CR (with candidate character augmentation) [3], 91.73% AR and 92.37% CR (with language model adaptation) [12], 95.21% AR and 96.28% CR (with CNN character classifier) [32].

The major contributions of this work are in three respects. First, we perform a comprehensive evaluation of NNLMs in handwritten Chinese text recognition and propose hybrid NNLMs to improve the performance. Second, we apply CNNs to over-segmentation and geometric context modeling in addition to character recognition. Third, by training NNLMs on large corpus and integrating CNN shape models, we achieve new state-of-the-art performance on standard datasets. In addition, we analyze the upper bound of performance of the text recognition system by calculating the lattice error rate, which shows the potential of improvement in the future.

The rest of this paper is organized as follows: Section 2 reviews some related works; Section 3 gives an overview of the handwritten Chinese text recognition system; Section 4 describes the FNNLMs and RNNLMs, as well as techniques for accelerating them; Section 5 presents the CNN based models, including character classifier, over-segmentation algorithm, and geometric context models; Section 6 presents experimental results, and Section 7 offers concluding remarks.

Section snippets

Related works

The neural network architecture has a strong impact on the performance of NNLMs, and comparative studies have been conducted by some researchers [28], [47], [48], [49], [50], [51]. Mikolov et al. [28] gave an empirical comparison between RNNLMs and FNNLMs on two corpora, and found that simple RNNLMs outperformed the standard FNNLMs in terms of perplexity (PPL) on both the Penn Tree Bank and the Switchboard corpus. Mikolov et al. [47] presented PPL results obtained with several advanced language

System overview

Our system is based on the integrated segmentation-and-recognition framework, which typically consists the steps of over-segmentation of a text line image, construction of the segmentation–recognition candidate lattice, and path search in the lattice with context fusion. The diagram of our system is shown in Fig. 1, and the tasks of document image pre-processing and text line segmentation are assumed to have been accomplished externally.

First, the input text line image is over-segmented into a

Neural network language models

To overcome the data sparseness problem of traditional BLMs, we introduce two types of NNLMs including FNNLMs and RNNLMs in this section.

If the sequence C contains m characters, P(C) can be decomposed as:p(C)=i=1mp(ci|c1i1),where c1i1=c1,,ci1 denotes the history of character ci. For an N-gram model, it only considers the N1 history characters in (2):p(C)=i=1mp(ci|ciN+1i1)=i=1mp(ci|hi),where hi=ciN+1i1=ciN+1,,ci1 (h1 is null). Although FNNLMs can be trained with larger context

Convolutional neural network shape models

With the impact of the success of deep learning [66], [67] in different domains, we consider altering the modules of HCTR framework, namely, character classifier, over segmentation, and geometric context models from traditional methods to convolutional neural network (CNN) [68] based models. These models take character or text images as input, and so, are called shape models in general.

Experimental results

We evaluated the performance of our handwritten Chinese text recognition system on two databases: a large database of offline Chinese handwriting called CASIA-HWDB [54] and a small dataset from the ICDAR 2013 Chinese Handwriting Recognition Competition [36], abbreviated as ICDAR-2013. The system was implemented on a desktop computer of Intel Core i7-4790 3.60 GHz CPU, programming with C++ in Microsoft Visual Studio 2008. While for training NNLMs and CNN shape models, we also used NVIDIA Titan X

Conclusion

In this paper, we evaluated the effects of two types of character-level NNLMs, namely, FNNLMs and RNNLMs, with the aim of improving Chinese handwritten text recognition. Both FNNLMs and RNNLMs are also combined with BLMs to construct HLMs. We evaluated in a text line recognition system with the same character over-segmentation and classification techniques as in a state-of-the-art system, and compared various LMs trained on a small text corpus as used before. Experimental results on the Chinese

Acknowledgments

We would like to thank Zhuo Chen and Xin He for the help in implementing CNN based over-segmentation, and Xiang-Dong Zhou for sharing the idea of lattice error rate. This work has been supported by the National Natural Science Foundation of China (NSFC) grants 61305005, 61273269, 61573355, and 61411136002.

Yi-Chao Wu received the B.S. degree in Automation from Xidian University, Xi'an, China, in 2012. He is currently pursuing his Ph.D. degree in Pattern Recognition and Intelligent Systems at the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, Beijing, China. His research interests include handwritten text recognition, language modeling, and sequence pattern recognition.

References (83)

  • T.-H. Su et al.

    Off-line recognition of realistic Chinese handwriting using segmentation-free strategy

    Pattern Recognit.

    (2009)
  • M.-K. Zhou et al.

    Discriminative quadratic feature learning for handwritten Chinese character recognition

    Pattern Recognit.

    (2016)
  • R.-W. Dai et al.

    Chinese character recognitionhistory, status and prospects

    Front. Comput. Sci. China

    (2007)
  • Q.-F. Wang et al.

    Handwritten Chinese text recognition by integrating multiple contexts

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2012)
  • S. Katz

    Estimation of probabilities from sparse data for the language model component of a speech recognizer

    IEEE Trans. Acoust. Speech Signal Process.

    (1987)
  • S.F. Chen, J. Goodman, An empirical study of smoothing techniques for language modeling, in: Proceedings of the 34th...
  • U.-V. Marti et al.

    Using a statistical language model to improve the performance of an HMM-based cursive handwriting recognition system

    Int. J. Pattern Recognit. Artif. Intell.

    (2001)
  • H. Bunke et al.

    Offline recognition of unconstrained handwritten texts using HMMs and statistical language models

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2004)
  • S. Espana-Boquera et al.

    Improving offline handwritten text recognition with hybrid HMM/ANN models

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2011)
  • X.-D. Zhou et al.

    Handwritten Chinese/Japanese text recognition using semi-Markov conditional random fields

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2013)
  • A. Bissacco, M. Cummins, Y. Netzer, H. Neven, PhotoOCR: reading text in uncontrolled conditions, in: Proceedings of the...
  • R. Messina, J. Louradour, Segmentation-free handwritten Chinese text recognition with lstm-rnn, in: Proceedings of the...
  • B. Carpenter, Scaling high-order character language models to gigabytes, in: Proceedings of the Workshop on Software,...
  • Y. Bengio et al.

    A neural probabilistic language model

    J. Mach. Learn. Res.

    (2003)
  • T. Mikolov, M. Karafiát, L. Burget, J. Cernocky`, S. Khudanpur, Recurrent neural network based language model., in:...
  • H. Schwenk, Continuous space translation models for phrase-based statistical machine translation, in: Proceedings of...
  • H. Schwenk, A. Rousseau, M. Attik, Large, pruned or continuous space language models on a gpu for statistical machine...
  • Y.-C. Wu, F. Yin, C.-L. Liu, Evaluation of neural network language models in handwritten Chinese text recognition, in:...
  • A. Mnih, G. Hinton, Three new graphical models for statistical language modelling, in: Proceedings of the 24th...
  • T. Morioka, T. Iwata, T. Hori, T. Kobayashi, Multiscale recurrent neural network based language model, in: Proceedings...
  • K. Irie, R. Schlüter, H. Ney, Bag-of-words input for long history representation in neural network-based language...
  • A. Mnih, G.E. Hinton, A scalable hierarchical distributed language model, in: Proceedings of the Advances in Neural...
  • F. Morin, Y. Bengio, Hierarchical probabilistic neural network language model, in: Proceedings of the AISTATS, vol. 5,...
  • A. Mnih, Y.W. Teh, A fast and simple algorithm for training neural probabilistic language models, in: Proceedings of...
  • T. Mikolov, S. Kombrink, L. Burget, J.H. Černocky`, S. Khudanpur, Extensions of recurrent neural network language...
  • Y. Bengio et al.

    Adaptive importance sampling to accelerate training of a neural probabilistic language model

    IEEE Trans. Neural Netw.

    (2008)
  • T. Mikolov, A. Deoras, D. Povey, L. Burget, J. Černocky`, Strategies for training large scale neural network language...
  • S. Kombrink, T. Mikolov, M. Karafiát, L. Burget, Recurrent neural network based language modeling in meeting...
  • S. Wang, L. Chen, L. Xu, W. Fan, J. Sun, S. Naoi, Deep knowledge training and heterogeneous CNN for handwritten Chinese...
  • C.-L. Liu et al.

    Lexicon-driven segmentation and recognition of handwritten character strings for Japanese address reading

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2002)
  • X.-D. Zhou, J.-L. Yu, C.-L. Liu, T. Nagasaki, K. Marukawa, Online handwritten Japanese character string recognition...
  • Cited by (156)

    • Hybrid Arabic handwritten character segmentation using CNN and graph theory algorithm

      2024, Journal of King Saud University - Computer and Information Sciences
    View all citing articles on Scopus

    Yi-Chao Wu received the B.S. degree in Automation from Xidian University, Xi'an, China, in 2012. He is currently pursuing his Ph.D. degree in Pattern Recognition and Intelligent Systems at the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, Beijing, China. His research interests include handwritten text recognition, language modeling, and sequence pattern recognition.

    Fei Yin is an Associate Professor at the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, Beijing, China. He received the B.S. degree in Computer Science from Xidian University of Posts and Telecommunications, Xi'an, China, the M.E. degree in Pattern Recognition and Intelligent Systems from Huazhong University of Science and Technology, Wuhan, China, the Ph.D. degree in Pattern Recognition and Intelligent Systems from the Institute of Automation, Chinese Academy of Sciences, Beijing, China, in 1999, 2002 and 2010, respectively. His research interests include document image analysis, handwritten character recognition and image processing. He has published over 30 papers at international journals and conferences.

    Cheng-Lin Liu received the B.S. degree in Electronic Engineering from Wuhan University, Wuhan, China, the M.E. degree in Electronic Engineering from Beijing Polytechnic University (current Beijing University of Technology), Beijing, China, the Ph.D. degree in Pattern Recognition and Intelligent Systems from the Institute of Automation of Chinese Academy of Sciences, Beijing, China, in 1989, 1992 and 1995, respectively. He was a Postdoctoral Fellow at Korea Advanced Institute of Science and Technology (KAIST) and later at Tokyo University of Agriculture and Technology from March 1996 to March 1999. From 1999 to 2004, he was a Research Staff Member and later a Senior Researcher at the Central Research Laboratory, Hitachi, Ltd., Tokyo, Japan. From 2005, he has been a Professor at the National Laboratory of Pattern Recognition (NLPR), Institute of Automation of Chinese Academy of Sciences, Beijing, China, and is now the Director of the laboratory. His research interests include pattern recognition, image processing, neural networks, machine learning, and the applications to character recognition and document analysis. He has published over 200 technical papers at prestigious international journals and conferences. He won the IAPR/ICDAR Young Investigator Award of 2005. He serves on the editorial board of Pattern Recognition Journal, Image and Vision and Computing, International Journal on Document Analysis and Recognition, and Cognitive Computation. He is a Fellow of the IAPR and the IEEE.

    View full text