Abstract
In a multi-lingual country like India, a document may contain more than one script forms. For such a document, it is necessary to separate different script forms before feeding them to OCRs of respective scripts. In the work presented in this paper, a successful attempt has been made to identify the script at the word level in a bilingual document containing Roman and Gurmukhi scripts. The technique presented here can separate English and Punjabi words present in a single document. In this approach English and Punjabi words are separated using certain features of Gurmukhi and Roman script. Words with various font styles and sizes have been used for the testing of the proposed algorithms and the results are quite encouraging. The system has an overall accuracy of 98.78% of identification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Lehal, G.S., Singh, C.: A Gurmukhi script recognition system. In: Proc. 15th ICPR, Barcelona, Spain, vol. 2, pp. 557–560. IEEE Computer Society Press, California (2000)
Dhanya, D., Ramakrishan, A.G., Pati, P.B.: Language identification in printed bilingual documents. Sadhana 27(1), 73–82 (2002)
Peake, G.S., Tan, T.N.: Script and Language Identification from Document Images. In: Workshop on Document Image Analysis (DIA 1997), pp. 10–17 (1997)
Chaudhary, B.B., Pal, U.: Skew angle detection of digitized Indian script documents. IEEE Trans. PAMI 19, 182–186 (1997)
Pal, U., Chaudhuri, B.B.: Automatic identification of English, Chinese, Arabic, Devnagari and Bangla script line. In: Proc. 6th ICDAR, September 10-13, pp. 790–794 (2001)
Pal, U., Chaudhuri, B.B.: Script line separation from Indian multi-script documents. In: Proc. 5th ICDAR, Banglore, India, pp. 406–409 (1999)
Elgammal, A., Ismail, M.A.: Techniques for Language Identification for Hybrid Arabic-English Document Images. In: Proc. 6th ICDAR, Seattle, Washington, U.S.A., September 10-13, pp. 1100–1104 (2001)
Casey, R.G., Wong, K.Y.: Document-analysis systems and techniques. In: Kasturi, R., Trivedi, M.M. (eds.) Image Analysis Applications, pp. 1–36. Marcel Dekker, New York (1990)
Casey, R.G., Lecolinet, E.: A survey of methods and strategies in character segmentation. IEEE Trans. PAMI 18, 690–706 (1996)
Casey, R.G., Nagy, G.: Recursive segmentation and classification of composite patterns. In: Proc. 6th ICPR, München, Germany, pp. 1023–1031 (1982)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sharma, D. (2010). Separation of Machine Printed Roman and Gurmukhi Script Words. In: Das, V.V., et al. Information Processing and Management. BAIP 2010. Communications in Computer and Information Science, vol 70. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12214-9_84
Download citation
DOI: https://doi.org/10.1007/978-3-642-12214-9_84
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12213-2
Online ISBN: 978-3-642-12214-9
eBook Packages: Computer ScienceComputer Science (R0)