Skip to main content

Separation of Machine Printed Roman and Gurmukhi Script Words

  • Conference paper
Information Processing and Management (BAIP 2010)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 70))

  • 889 Accesses

Abstract

In a multi-lingual country like India, a document may contain more than one script forms. For such a document, it is necessary to separate different script forms before feeding them to OCRs of respective scripts. In the work presented in this paper, a successful attempt has been made to identify the script at the word level in a bilingual document containing Roman and Gurmukhi scripts. The technique presented here can separate English and Punjabi words present in a single document. In this approach English and Punjabi words are separated using certain features of Gurmukhi and Roman script. Words with various font styles and sizes have been used for the testing of the proposed algorithms and the results are quite encouraging. The system has an overall accuracy of 98.78% of identification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lehal, G.S., Singh, C.: A Gurmukhi script recognition system. In: Proc. 15th ICPR, Barcelona, Spain, vol. 2, pp. 557–560. IEEE Computer Society Press, California (2000)

    Google Scholar 

  2. Dhanya, D., Ramakrishan, A.G., Pati, P.B.: Language identification in printed bilingual documents. Sadhana 27(1), 73–82 (2002)

    Article  Google Scholar 

  3. Peake, G.S., Tan, T.N.: Script and Language Identification from Document Images. In: Workshop on Document Image Analysis (DIA 1997), pp. 10–17 (1997)

    Google Scholar 

  4. Chaudhary, B.B., Pal, U.: Skew angle detection of digitized Indian script documents. IEEE Trans. PAMI 19, 182–186 (1997)

    Google Scholar 

  5. Pal, U., Chaudhuri, B.B.: Automatic identification of English, Chinese, Arabic, Devnagari and Bangla script line. In: Proc. 6th ICDAR, September 10-13, pp. 790–794 (2001)

    Google Scholar 

  6. Pal, U., Chaudhuri, B.B.: Script line separation from Indian multi-script documents. In: Proc. 5th ICDAR, Banglore, India, pp. 406–409 (1999)

    Google Scholar 

  7. Elgammal, A., Ismail, M.A.: Techniques for Language Identification for Hybrid Arabic-English Document Images. In: Proc. 6th ICDAR, Seattle, Washington, U.S.A., September 10-13, pp. 1100–1104 (2001)

    Google Scholar 

  8. Casey, R.G., Wong, K.Y.: Document-analysis systems and techniques. In: Kasturi, R., Trivedi, M.M. (eds.) Image Analysis Applications, pp. 1–36. Marcel Dekker, New York (1990)

    Google Scholar 

  9. Casey, R.G., Lecolinet, E.: A survey of methods and strategies in character segmentation. IEEE Trans. PAMI 18, 690–706 (1996)

    Google Scholar 

  10. Casey, R.G., Nagy, G.: Recursive segmentation and classification of composite patterns. In: Proc. 6th ICPR, München, Germany, pp. 1023–1031 (1982)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sharma, D. (2010). Separation of Machine Printed Roman and Gurmukhi Script Words. In: Das, V.V., et al. Information Processing and Management. BAIP 2010. Communications in Computer and Information Science, vol 70. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12214-9_84

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12214-9_84

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12213-2

  • Online ISBN: 978-3-642-12214-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics