Optimal Feature Extraction for Bilingual OCR

Dhanya, D.; Ramakrishnan, A. G.

doi:10.1007/3-540-45869-7_3

D. Dhanya⁶ &
A. G. Ramakrishnan⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2423))

Included in the following conference series:

International Workshop on Document Analysis Systems

1038 Accesses
7 Citations

Abstract

Feature extraction in bilingual OCR is handicapped bythe increase in the number of classes or characters to be handled. This is evident in the case of Indian languages whose alphabet set is large. It is expected that the complexityof the feature extraction process increases with the number of classes. Though the determination of the best set of features that could be used cannot be ascertained through anyquan titative measures, the characteristics of the scripts can help decide on the feature extraction procedure. This paper describes a hierarchical feature extraction scheme for recognition of printed bilingual (Tamil and Roman) text. The scheme divides the combined alphabet set of both the scripts into subsets bythe extraction of certain spatial and structural features. Three features viz geometric moments, DCT based features and Wavelet transform based features are extracted from the grouped symbols and a linear transformation is performed on them for the purpose of efficient representation in the feature space. The transformation is obtained bythe maximization of certain criterion functions. Three techniques : Principal component analysis, maximization of Fisher’s ratio and maximization of divergence measure have been employed to estimate the transformation matrix. It has been observed that the proposed hierarchical scheme allows for easier handling of the alphabets and there is an appreciable rise in the recognition accuracyas a result of the transformations.

Download to read the full chapter text

Chapter PDF

Language discrimination by texture analysis of the image corresponding to the text

Article 19 August 2016

Darko Brodić, Alessia Amelio & Zoran N. Milivojević

Handwritten Script Recognition Using DCT, Gabor Filter, and Wavelet Features at Word Level

Characterization and Distinction Between Closely Related South Slavic Languages on the Example of Serbian and Croatian

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Dhanya, D., Ramakrishnan, A.G.: Script identification in printed bilingual documents. Published in the same (2002).
Google Scholar
A. Sinha: Improved recognition module for the identification of handwritten digits. Master’s thesis, Department of Electrical Engineering and Computer Science, Massachuset (1999)
Google Scholar
Teh, C.H., Chin, R.T.: On image analysis by method of moments. IEEE Transaction on Pattern Analysis and Machine Intelligence 10 (1993) 496–513
Article Google Scholar
Khotanzad, A., Hong, Y.H.: Rotation invariant image representation using features selected via a systematic method. Pattern Recognition 23 (1990) 1089–1101
Article Google Scholar
Bailey, R.R.: Orthogonal moment features for use with parametric and nonparametric classifiers. IEEE Transaction on Pattern Analysis and Machine Intelligence 18 (1996) 389–399
Article Google Scholar
P.C. Loizou, Spanias, A.: Improved speech recognition using a subspace projection approach. IEEE Transaction on Speech and Audio Processing 7 (1999)
Google Scholar
Dhanya, D.: Bilingual ocr for tamil and roman scripts. Master’s thesis, Department of Electrical Engineering, Indian Institute of Science (2001)
Google Scholar
Hamamoto, Y., et al.: A bootstrap technique for nearest neighbour classifier design. IEEE Transaction on Pattern Analysis and Machine Intelligence 19 (1993) 73–79
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Indian Institute of Science, Bangalore, India
D. Dhanya & A. G. Ramakrishnan

Authors

D. Dhanya
View author publications
You can also search for this author in PubMed Google Scholar
A. G. Ramakrishnan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Bell Labs, Lucent Technologies, 600 Mountain Avenue, 07974, Murray Hill, NJ, USA
Daniel Lopresti
Avaya Labs Research, 233 Mount Airy Road, 07920, Basking Ridge, NJ, USA
Jianying Hu & Ramanujan Kashi &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dhanya, D., Ramakrishnan, A.G. (2002). Optimal Feature Extraction for Bilingual OCR. In: Lopresti, D., Hu, J., Kashi, R. (eds) Document Analysis Systems V. DAS 2002. Lecture Notes in Computer Science, vol 2423. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45869-7_3

Download citation

DOI: https://doi.org/10.1007/3-540-45869-7_3
Published: 09 August 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44068-0
Online ISBN: 978-3-540-45869-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Optimal Feature Extraction for Bilingual OCR

Abstract

Chapter PDF

Similar content being viewed by others

Language discrimination by texture analysis of the image corresponding to the text

Handwritten Script Recognition Using DCT, Gabor Filter, and Wavelet Features at Word Level

Characterization and Distinction Between Closely Related South Slavic Languages on the Example of Serbian and Croatian

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Optimal Feature Extraction for Bilingual OCR

Abstract

Chapter PDF

Similar content being viewed by others

Language discrimination by texture analysis of the image corresponding to the text

Handwritten Script Recognition Using DCT, Gabor Filter, and Wavelet Features at Word Level

Characterization and Distinction Between Closely Related South Slavic Languages on the Example of Serbian and Croatian

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation