Elsevier

Pattern Recognition

Volume 45, Issue 4, April 2012, Pages 1306-1317
Pattern Recognition

Binary segmentation algorithm for English cursive handwriting recognition

https://doi.org/10.1016/j.patcog.2011.09.015Get rights and content

Abstract

Segmentation in off-line cursive handwriting recognition is a process for extracting individual characters from handwritten words. It is one of the most difficult processes in handwriting recognition because characters are very often connected, slanted and overlapped. Handwritten characters differ in size and shape as well. Hybrid segmentation techniques, especially over-segmentation and validation, are a mainstream to solve the segmentation problem in cursive off-line handwriting recognition. However, the core weakness of the segmentation techniques in the literature is that they impose high risks of chain failure during an ordered validation process. This paper presents a novel Binary Segmentation Algorithm (BSA) that reduces the risks of the chain failure problems during validation and improves the segmentation accuracy. The binary segmentation algorithm is a hybrid segmentation technique and it consists of over-segmentation and validation modules. The main difference between BSA and other techniques in the literature is that BSA adopts an un-ordered segmentation strategy. The proposed algorithm has been evaluated on CEDAR benchmark database and the results of the experiments are very promising.

Highlights

► Unordered segment validation for offline cursive handwriting recognition. ► Improved segmentation accuracy on CEDAR benchmark database. ► Improved word recognition accuracy on CEDAR benchmark database.

Introduction

The role of segmentation is to find correct letter boundaries. Segmentation precedes character recognition, which means that the output of segmentation becomes the input to the character recognition module. This implicates that the segmentation performance directly affects the recognition performance. However, there is a big dilemma in the relationship between segmentation and recognition. The quality of segmentation heavily influences the recognition success. On the contrary, the result of segmentation whether segmented correctly or not may depend on the result of recognition. This dilemma is well known as chicken–egg relationship.

To avoid the above mentioned problem, some researchers preferred to remove the segmentation step. Instead, they have introduced the holistic or segmentation-free approach. The idea is to recognize whole words against a lexicon, not individual characters. However, the main problem in holistic approach is the number of classes, which is as many as the number of words. In English, there are 52 characters. In segmentation based approach, classifier can learn 52 characters then it can recognize any word from a dictionary. On the other hand, holistic classifier needs to learn as many classes as the number of lexical words. In an English dictionary, there are more than 200,000 entries; the odds of correct classification are 1:52 versus 1:200,000. In holistic approach, training of a classifier for many classes is one of the major issues due to the amount of the data required for learning. Meanwhile, training of a classifier for English characters requires very small portion of data for learning in comparison to holistic approaches. Therefore, segmentation based approach is more practical than the holistic approach for real world problems.

Segmentation of off-line cursive words into characters is one of the most difficult processes in handwriting recognition and it is also defined as one of the most important processes because it directly affects the result of recognition process [1], [2], [3], [4], [5], [6], [7], [8], [9]. Researchers in the field have found that over-segmentation and validation (OSV) techniques produce promising results because heuristic over-segmenter tends to find all possible character boundaries and the excessive segmentation points created by the over-segmenter are removed from the final segmentation points through validation process. Intelligent multiple experts are commonly employed as validator during this process. Generally confidence values for each segmentation point are allocated by the experts, and unlikely ones are removed from the final segmentation point set.

Segmentation module in handwriting recognition plays a crucial role for successful performance. However, it is very difficult to find precise character boundaries without knowledge about characters. So to avoid the error prone process, segmentation-free approach has been proposed as one of the segmentation strategies. Benouareth et al. [10] used holistic method for Arabic word recognition. To build a feature vector sequence, two segmentation schemes are incorporated to divide a word into frames. The first one is uniform segmentation, which vertically divides a word into equal sized frames. The second one is non-uniform segmentation, which has variable frame size. After segmenting word into frames, statistical and structural features are extracted by capturing ascenders, descenders, dots, concavity, and stroke direction. Vinciarelli [11], [12] used similar technique for information retrieval for writer dependent documents. Prior to the information retrieval, the individual words on the handwritten document need to be recognized correctly. Using a fixed size sliding window, density feature is extracted for HMM to perform recognition by calculating the likelihood of a word against lexicon. Similarly, methodologies presented in [13], [14], [15], [16], [17] also adapted the segmentation-free strategy. Sliding window and geometrical feature extraction are the base of the HMM module recognition. Mozaffari et al. [18] proposed a lexicon reduction scheme for static Farsi handwriting recognition by analyzing dots within characters.

In segmentation based approaches, a word is segmented into characters and characters are learned by a classifier. Viard-Gaudin et al. [19] made an effort to model the writing sequence of strokes for a handwritten image. Following the writing order, the word image is segmented into individual strokes and k-means algorithm is used to cluster the strokes into representative symbols. The recognition is done by modeling the sequence of the symbols using HMMs. Liu and Gader [20] proposed an approach with over-segmentation and Dynamic Programming (DP) strategy. A word image is segmented into sub-images, which represent a single or partial characters. A union function between neighboring sub-images assigns the confidence value based on the compatibility. The compatibility accounts for the spatial relationships and relative sizes between neighboring unions by artificial neural networks. The recognition is done by DP, which is used to find a sequence of unions that fits a given lexicon string. Verma et al. [21] used over-segmentation strategy by locating handwriting features such as upper and lower word contours, holes, upper and lower contour minima and vertical density histograms. Based on heuristics regarding handwriting features, confidence values are assigned to each over-segmented point. Two artificial neural network based classifiers are incorporated for further validation. The first one inspects the characteristics of each segmentation point. The second one validates each segmentation point based on the knowledge of characters.

A heuristic technique proposed in [22] locates structural features and over-segments each word. For each segmentation point, a left primitive (preceding the segmentation point) and a joined primitive (joined on the segmentation point) are extracted along with other features from the segmentation region. These features are fed to segmentation point validation artificial neural networks to produce a number of confidence values. The final segmentation is decided based on the confidence values. Similar technique, rule-based over-segmentation and validation, is used in [23]. Rule-based modules validate every over-segmentation point against closed area, average character size, contour code of left character, and density. Verma et al. [24] from also proposed over-segmentation and validation approach to solve the cursive handwriting recognition problem. Three different validation experts are employed to produce the ranks and confidence values, which are the ingredients of Borda count. The final word recognition is the lexical word with the highest Borda count. In [25], over-segmented primitives of cursive handwritten month words to process handwritten bank cheques are fed to HMM and artificial neural networks to find the optimum segmentation paths. Vellasques et al. [26] proposed an approach that distinguishes the validation from filtration of the over-segmentation points. They also define that the excessive amount of unnecessary over-segmentation points directly affects the recognition performance. Therefore, it is emphasized that the filtration of the over-segmentation points should precede the validation. In their touching digit recognition system, they show that filter can remove up to 83% of the unnecessary over-segmentation hypothesis.

In previous studies of segmentation strategies, the validation module processes the over-segmentation points in orderly manner, normally left-to-right. Because it checks the spatial relationship between primitives, a validation of the current segmentation point might be affected by the failure of the previous segmentation point, namely chain failure. With the intention of reducing the risk of the chain failure, Binary Segmentation Algorithm (BSA) is proposed in this paper. BSA intends to segment an image into two sub-images on a most likely segmentation point. Depending on selection criteria, one sub-image is nominated and segmented again into two sub-images. The selection and segmentation processes are continuously repeated until the termination conditions are satisfied. In order to maximize the potential of BSA, sub-image selection and termination conditions are the most crucial components in the algorithm.

A primitive from over-segmentation is more likely to be a partial character rather than a complete character. Validation is a process to combine n neighboring primitives and make a character. In the literature, validation is performed by designated classifiers for checking whether or not the combined primitives are correct characters. The classifiers usually have knowledge of legality of whole characters. Every spatial relationship between primitives is evaluated by the classifiers, which project the likelihood of being whole characters. If the likelihood is better than the threshold or pre-set criteria, the evaluating spatial relationship is approved. Traditionally, the validation was conducted on an ordered evaluation of n neighboring primitives. The ordered validation is prone to chain failure. The chain failure is like domino effect of incorrect validation. When a validation is incorrect, the following validation would be affected and produce incorrect validations.

Validation of n=2 takes two neighboring primitives and evaluates the competency, which is compared to a preset threshold. In Fig. 1(a), an example of chain failure from traditional ordered validation is shown. In the example, the validation is conducted from left to right. By visual judgement, the first segment does not look like any characters in English alphabets. However, on segmentation point 2, the classifier may recognize the segment as a letter ‘u’. Then the following segment formed by segmentation points 2 and 3 is more likely to be recognized as a letter, ‘l’ or ‘I’. The next segment is highly recognizable as a letter ‘o’. The next segment formed by 4 and 5 is more likely to be recognized as a letter ‘r’, and the final segment might be regarded as a non-character segment. In the example, the orderly manner encouraged the letter ‘w’ to be recognized into two characters, and the letter ‘n’ into ‘r’ and non-character.

In Fig. 1(b), unordered nature of binary segmentation is shown, which can reduce the risk of chain failure. In the example, the first segmentation is more likely done on the segmentation point 3 according to the segmentation point selection algorithm. Successively, the rest of segmentation can be continued in the order of segmentation point 4, 2, 5 and 1. Unlike in ordered validation, the binary segmentation exhibits the correct extraction of the individual characters after segmentation on 4. By reducing the risk of chain failure, the segmentation accuracy can be improved as the overall segmentation accuracy is measured by calculating the bad, over and under segmentation errors. The lesser the bad, over and under segmentation errors are, the better overall segmentation accuracy is. The over segmentation errors are caused by recognizing a partial character as a correct character during validation. The bad and under segmentation errors are caused by recognizing combined unrelated partial characters as a correct character during validation. The proposed binary algorithm reduces the chain failure, which means it reduces the partial characters and combinations of unrelated partial characters. Therefore, the bad, over and under segmentation errors are reduced, which reduce overall segmentation error.

The segmentation results are sensitive to the terminating condition parameters, especially average character width. The main purpose of incorporating the average character width in the binary segmentation is to reduce the over-segmentation errors. The decreased average character width will increase the number of over-segmentation errors. The increased average character width will increase the number of under-segmentation errors. The maximum number of segmentation is another parameter involved in terminating conditions. It limits the number of segments in a word. The primary reason to incorporate the maximum number of segmentation parameter is to reduce the over-segmentation errors. If it is large, over-segmentation error gets larger. If it is small, under-segmentation error gets larger. Segmentation results can also be affected by the parameter, which checks whether a sub-image has suspicious segmentation points or not. If a sub-image has no suspicious segmentation points, it will not be further segmented. If a sub-image contains more than one character component and has no suspicious segmentation points, it will create under-segmentation errors. In the binary segmentation, all parameters are calculated automatically based on each input word, not manually tuned.

Researchers in handwriting recognition have been treating recognition in different languages as a different problem. Each language has its own character sets and the language specific syntax rules govern how those characters are to be arranged. Character segmentation requires the understanding of the target languages. The proposed binary segmentation algorithm is only tailored to segment words in English language. Incorporating over-segmentation heuristics are based on English alphabet characteristics. In addition, the artificial neural network classifier for character classification is only trained on 52 English alphabet letters.

The rest of this paper is organized into four sections. Section 2 describes the proposed binary segmentation algorithm in detail. Also, it describes how over-segmentation and binary segmentation works together. Section 3 presents the experimental results. An analysis of experimental results and a comparison of results are presented in Section 4. Finally, Section 5 concludes the paper.

Section snippets

Binary segmentation algorithm

The overview of the proposed over-segmentation and Binary Segmentation Algorithm (BSA) strategy is shown below in Fig. 2. A word image is fed into Suspicious Segmentation Points (SSPs) generation module and one of SSPs is nominated as a Segmentation Point (SP). BSA is applied to SP. Each BSA run produces two sub-images. Selection of a sub-image and segmentation of the sub-image are repeated until the terminating conditions are satisfied. In the following sub-sections, the details of BSA

Experimental results

This section describes the implementation platform, the database, parameters setup, artificial neural network classifier and the experimental results.

Analysis and discussion

The ultimate purpose of getting good segmentation results is to increase the handwriting recognition accuracy. Despite the fact that a comparative analysis should be done based on recognition performances, there are not many authors publishing their segmentation and recognition results on a benchmark database. Even though research in handwriting recognition has been an active research area for more than a half century, the maturity of the segmentation techniques is still very low. The proposed

Conclusions and future research

In this paper, a novel segmentation paradigm for off-line cursive handwriting recognition has been proposed and investigated. The segmentation paradigm contains a baseline pixel-based over-segmenter, hole detection, segment foreground pixel comparison and a binary segmentation algorithm. The new segmentation paradigm has been tested on CEDAR benchmark database. The proposed segmentation approach exhibits competitive performance in comparison to existing approaches. The proposed approach

Hong Lee received the Bachelor degree in Information Technology from the Queensland University of Technology, Australia, in 2005. His first research study achieved the First Class Honours in Information Technology from CQUniversity, Australia, in 2007 and he is currently undertaking PhD research at the same institution. His research interests include neural networks, language classification and handwriting recognition.

References (29)

Cited by (42)

  • Personal digital bodyguards for e-security, e-learning and e-health: A prospective survey

    2018, Pattern Recognition
    Citation Excerpt :

    Duplicated or partially duplicated strokes often occur on tablets, since a user duplicates or adds strokes when a stroke is not well captured. Compared to isolated character [90] or word recognition [109–111], text recognition faces the difficulty of word segmentation [112,113] or character segmentation [114]. There are two approaches for segmentation: implicit segmentation and explicit segmentation.

  • A fuzzy approach to segment touching characters

    2017, Expert Systems with Applications
    Citation Excerpt :

    In this case, our success percentage is 88.9%. As stated in Lee and Verma (2002), although research in handwriting recognition has been an active research area for more than an half-century, the maturity of the segmentation techniques is still very low. Our proposed approach focused on improving the segmentation accuracy, achieving comparable results to other works.

  • Efficient character segmentation approach for machine-typed documents

    2017, Expert Systems with Applications
    Citation Excerpt :

    Some of them analyze approaches for character segmentation in natural images (González & Bergasa, 2013; Karatzas & Antonacopoulos, 2007; Lim, Park, & Medioni, 2007) and others deal with character segmentation on document images. This second group has a clear line between machine-printed and machine-typed documents (Antonacopoulos & Karatzas, 2005; Kise, Sato, & Iwata, 1998; Lu, 1995; Min-Chul, Yong-Chul, & Srihari, 1999; Nikolaou, Makridis, Gatos, Stamatopoulos, & Papamarkos, 2010; Park, Ok, Yu, & Cho, 2001) where a document structure and shape of its elements is regular, and handwritten documents where character segmentation is a real challenge due to irregular document structure (Choudhary, Rishi, & Ahlawat, 2013; Kovalchuk, Wolf, & Dershowitz, 2014; Lee & Verma, 2012; Lu & Shridhar, 1996; Manohar, Vitaladevuni, Cao, Prasad, & Natarajan, 2011; Rehman & Saba, 2011; Stamatopoulos, Gatos, Louloudis, Pal, & Alaei, 2013; Xu, Yin, & Liu, 2010; Younes & Abdellah, 2015). Old machine-printed and machine-typed documents are of particular research interest because of important historical documents (Bar-Yosef, Mokeichev, Kedem, Dinstein, & Ehrlich, 2009; Garz, Fischer, Sablatnig, & Bunke, 2012; Gupta, Jacobson, & Garcia, 2007; Vamvakas et al., 2008).

  • Improving handwritten Chinese text recognition using neural network language models and convolutional neural network shape models

    2017, Pattern Recognition
    Citation Excerpt :

    Nevertheless, to the best of our knowledge, except for our previous work on FNNLMs [21], there is no systematic evaluation of NNLMs for over-segmentation based text recognition systems. Apart from the language model, character classifier [32], over-segmentation [33,21,34] and geometric context models [35] (called shape models generally in this paper) are also important to the text recognition performance. CNN based classifiers for Chinese characters have reported superior performance in ICDAR 2013 competition [36], where CNNs reported much higher accuracies than traditional classifiers.

View all citing articles on Scopus

Hong Lee received the Bachelor degree in Information Technology from the Queensland University of Technology, Australia, in 2005. His first research study achieved the First Class Honours in Information Technology from CQUniversity, Australia, in 2007 and he is currently undertaking PhD research at the same institution. His research interests include neural networks, language classification and handwriting recognition.

Brijesh Verma is a Professor in the School of Information and Communication Technology at Central Queensland University (CQUni), Australia. His main research interests include computational intelligence and pattern recognition. He has published thirteen books and over hundred twenty papers in journals and conference proceedings. He has served on the organizing and program committees of over thirty international conferences. He is currently serving on the organizing committee of IEEE World Congress on Computational Intelligence and editorial boards of six international journals including Editors-in-Chief of International Journal of Computational Intelligence and Applications (IJCIA).

1

Tel.: +61 7 4930 9058.

View full text