Elsevier

Pattern Recognition

Volume 35, Issue 11, November 2002, Pages 2355-2364
Pattern Recognition

Robust vision-based features and classification schemes for off-line handwritten digit recognition

https://doi.org/10.1016/S0031-3203(01)00228-XGet rights and content

Abstract

We use well-established results in biological vision to construct a model for handwritten digit recognition. We show empirically that the features extracted by our model are linearly separable over a large training set (MNIST). Using only a linear discriminant system on these features, our model is relatively simple yet outperforms other models on the same data set. In particular, the best result is obtained by applying triowise linear support vector machines with soft voting on vision-based features extracted from deslanted images.

Introduction

Automated handwritten character recognition has been an active area of research and development for at least two decades [1], [2], [3], [4], [5], [6], [7], [8]. The literature on this topic alone is extremely huge, with a large variety of feature extraction and classification techniques being published every year. Features extracted range from geometric moments to contours and curvatures, while classification techniques range from template matching to neural networks. Some draw inspiration from biological systems, while others are based on statistics or geometry.

There are two main approaches to feature extraction. The more traditional approach is to handcraft the feature extraction process, as opposed to the other approach whereby the raw input is presented to a learning algorithm to discover whatever features are inherent in the domain. Each approach has its own merits and weaknesses. In the former approach, the main difficulty lies in determining the appropriate class of features to extract as well as in extracting those features in a robust and reliable way. Automated learning of features, on the other hand, is feasible only when there are a large number of samples available for each class. Hence, it may not be feasible for Kanji or Chinese characters, whereby the number of samples for each class is relatively few. In addition, in automated feature learning models such as neural networks, it is often difficult to analyze or even decipher the features learnt, which are in turn constrained by the activation functions or the learning algorithm itself. For example, features that are computed via non-differentiable or non-continuous functions (such as max,min, median, etc.) cannot be learnt by gradient descent. This is why, despite the availability of feature learning algorithms, the design of feature extractors continue to be an active area of research.

We set out to develop a handwritten digit recognition system that extracts features along the following principles:

(1) Biological basis: This is an old but reasonably successful principle that has been widely adopted in computer vision. After all, the biological visual system is the most robust recognition system we know, and hence it pays to emulate it wherever possible. In our model, we attempt to cover as many types of features as possible that are known to be extracted by the biological system.

(2) Linear separability: A crucial requirement for feature extraction is to minimize the within-class variability and enhance the between-class variability [9]. A qualitative measure of this requirement is linear separability. By having the feature set linearly separable over the training data, we need to use only linear classifiers which are simpler, faster to train, and less prone to problems (such as overfitting and local minima) that usually plague non-linear classifiers. In handwritten digit recognition, unless the database is very small (say 1000), it is extremely difficult to achieve linear separability due to the large number of variations in writing style, stroke thickness, skew, orientation, etc. Hence, it is significant that our model has achieved this goal for a large data set of 60 000.

(3) Clear semantics: It is often desirable to know the meaning of the features either for explanatory purposes or to facilitate further analysis. By explicitly extracting well-defined features, for example, our model achieves semantic clarity.

As in traditional pattern recognition systems, our model consists of two main modules (see Fig. 1): a feature extractor that generates a feature vector from the raw pixel map, and a feature classifier that outputs the class based on the feature vector. Despite its traditional structure, our vision-based model has achieved state-of-the-art performance in handwritten digit recognition, even surpassing the current record accuracy on the MNIST data set. A significant contribution of our work is the extraction of robust features that are linearly separable over a large set of training data in a highly non-linear domain. Another contribution is a novel triowise system of combining sub-domain classifiers, which gives the best performance so far among multiclass classification schemes.

The next two sections describe the feature extraction and classification processes, respectively. Experiments are then conducted with the model on handwritten digits, and results are compared with those in other works. This paper is an enhancement and extension of the work in Ref. [10].

Section snippets

Feature extraction

A crucial step in the design of the feature extractor involves deciding what features to extract, based upon the principles outlined previously. The biological visual system is known to extract a wide variety of local spatial features, such as edges, lines, and corners of various orientations, lengths and widths [11], [12], [13], [14], [15]. We chose to detect edge and corner orientations, as these are more relevant to the domain. In addition, the visual system distinguishes between bright and

Feature classification

We experimented with various types of classifiers: linear discriminant systems (one-per-class, pairwise, triowise) and k-nearest neighbor classifiers (Euclidean distance, cosine similarity).

Experiments on handwritten digits

We tested our model on the MNIST database of handwritten digits, since many methods have been tested on this database [4], [5] and hence it would serve as a good basis for comparison. This database, which can be downloaded from the AT&T Laboratories’ research web-site, consists of 60 000 training samples and 10 000 test samples, and was constructed from NIST's Special Database 1 and Special Database 3 [38]. The original binary images from NIST were size-normalized to fit in a 20×20 pixel box

Conclusion

We have developed a vision-based handwritten digit recognition system, which extracts features that are biologically plausible, linearly separable and semantically clear. With good features, we need only a relatively simple feature classifier that trains fast and gives excellent classification performance.

Possible future enhancements include yet further preprocessing for increased normalization, adding other types of features, and extension to multi-resolution modeling. The last is especially

About the Author—LOO-NIN TEOW is a doctoral student at the National University of Singapore, having obtained his B.Sc. and M.Sc. degrees in Computer Science from the same university in 1992 and 1997, respectively. Prior to his doctoral studies, he worked for 6 years at Kent Ridge Digital Labs, a research institute in Singapore; his last-held appointment there being a Research Associate. His areas of interest include pattern classification, machine learning, computer vision, and uncertainty

References (42)

  • A.A. Verikas et al.

    Investigation of a number of character recognition algorithms

  • P.A. Devijver et al.

    Pattern Recognition: A Statistical Approach

    (1982)
  • L.-N. Teow, K.-F. Loe, Handwritten digit recognition with a novel vision model that extracts linearly separable...
  • S. Coren et al.

    Sensation and Perception

    (1994)
  • D.H. Hubel et al.

    Receptive fields, binocular interaction and functional architecture in the cat's visual cortex

    J. Physiol.

    (1962)
  • D.H. Hubel et al.

    Receptive fields and functional architecture of monkey striate cortex

    J. Physiol.

    (1968)
  • R. Sekuler et al.

    Perception

    (1994)
  • H.R. Wilson, D. Levi, L. Maffei, J. Rovamo, R. DeValois, The perception of form: retina to striate cortex, in: L....
  • A. Fiorentini, G. Baumgartner, S. Magnussen, P.H. Schiller, J.P. Thomas, The perception of brightness and darkness:...
  • P.H. Schiller et al.

    Functions of the ON and OFF channels of the visual systems

    Nature

    (1986)
  • I. Biedermann

    Recognition-by-components: a theory of human image understanding

    Psychol. Rev.

    (1987)
  • Cited by (0)

    About the Author—LOO-NIN TEOW is a doctoral student at the National University of Singapore, having obtained his B.Sc. and M.Sc. degrees in Computer Science from the same university in 1992 and 1997, respectively. Prior to his doctoral studies, he worked for 6 years at Kent Ridge Digital Labs, a research institute in Singapore; his last-held appointment there being a Research Associate. His areas of interest include pattern classification, machine learning, computer vision, and uncertainty reasoning.

    About the Author—KIA-FOCK LOE is an associate professor in the Department of Computer Science at National University of Singapore. He received his Bachelor Degree and M.Sc. from Nanyang University in 1973 and 1977, respectively. He completed his Doctorate of Science from Tokyo University in 1985. His research interests are neural network, machine learning, pattern recongnition and geometric modeling.

    View full text