Separation of Machine Printed Roman and Gurmukhi Script Words

Sharma, Dharamveer

doi:10.1007/978-3-642-12214-9_84

Dharamveer Sharma¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 70))

Included in the following conference series:

International Conference on Business Administration and Information Processing

889 Accesses

Abstract

In a multi-lingual country like India, a document may contain more than one script forms. For such a document, it is necessary to separate different script forms before feeding them to OCRs of respective scripts. In the work presented in this paper, a successful attempt has been made to identify the script at the word level in a bilingual document containing Roman and Gurmukhi scripts. The technique presented here can separate English and Punjabi words present in a single document. In this approach English and Punjabi words are separated using certain features of Gurmukhi and Roman script. Words with various font styles and sizes have been used for the testing of the proposed algorithms and the results are quite encouraging. The system has an overall accuracy of 98.78% of identification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lehal, G.S., Singh, C.: A Gurmukhi script recognition system. In: Proc. 15th ICPR, Barcelona, Spain, vol. 2, pp. 557–560. IEEE Computer Society Press, California (2000)
Google Scholar
Dhanya, D., Ramakrishan, A.G., Pati, P.B.: Language identification in printed bilingual documents. Sadhana 27(1), 73–82 (2002)
Article Google Scholar
Peake, G.S., Tan, T.N.: Script and Language Identification from Document Images. In: Workshop on Document Image Analysis (DIA 1997), pp. 10–17 (1997)
Google Scholar
Chaudhary, B.B., Pal, U.: Skew angle detection of digitized Indian script documents. IEEE Trans. PAMI 19, 182–186 (1997)
Google Scholar
Pal, U., Chaudhuri, B.B.: Automatic identification of English, Chinese, Arabic, Devnagari and Bangla script line. In: Proc. 6th ICDAR, September 10-13, pp. 790–794 (2001)
Google Scholar
Pal, U., Chaudhuri, B.B.: Script line separation from Indian multi-script documents. In: Proc. 5th ICDAR, Banglore, India, pp. 406–409 (1999)
Google Scholar
Elgammal, A., Ismail, M.A.: Techniques for Language Identification for Hybrid Arabic-English Document Images. In: Proc. 6th ICDAR, Seattle, Washington, U.S.A., September 10-13, pp. 1100–1104 (2001)
Google Scholar
Casey, R.G., Wong, K.Y.: Document-analysis systems and techniques. In: Kasturi, R., Trivedi, M.M. (eds.) Image Analysis Applications, pp. 1–36. Marcel Dekker, New York (1990)
Google Scholar
Casey, R.G., Lecolinet, E.: A survey of methods and strategies in character segmentation. IEEE Trans. PAMI 18, 690–706 (1996)
Google Scholar
Casey, R.G., Nagy, G.: Recursive segmentation and classification of composite patterns. In: Proc. 6th ICPR, München, Germany, pp. 1023–1031 (1982)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Punjabi University, Patiala, Punjab, India
Dharamveer Sharma

Authors

Dharamveer Sharma
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Engineers Network, Trivandrum, Kerala, India
Vinu V Das
NSS College of Engineering, Palakkadu, Kerala, India
R. Vijayakumar
Winona State University, Winona, MN, USA
Narayan C. Debnath
Viswajiyithi College Engineering, Muvattupuzha, Kerala, India
Janahanlal Stephen
Jackson State University, Jackson, MS, USA
Natarajan Meghanathan
University of West Indies, Kingston, Jamaica
Suresh Sankaranarayanan
CGM, ACEEE, Pattom, Kerala, India
P. M. Thankachan
University of Indonesia, Depok, Indonesia
Ford Lumban Gaol
College of Engineering, Trivandrum, Kerala, India
Nessy Thankachan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sharma, D. (2010). Separation of Machine Printed Roman and Gurmukhi Script Words. In: Das, V.V., et al. Information Processing and Management. BAIP 2010. Communications in Computer and Information Science, vol 70. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12214-9_84

Download citation

DOI: https://doi.org/10.1007/978-3-642-12214-9_84
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12213-2
Online ISBN: 978-3-642-12214-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics