DL Architecture for Indic Scripts

Kompalli, Suryaprakash; Setlur, Srirangaraj; Govindaraju, Venugopal

doi:10.1007/978-3-540-28640-0_3

Suryaprakash Kompalli¹⁸,
Srirangaraj Setlur¹⁸ &
Venugopal Govindaraju¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3163))

Included in the following conference series:

International Workshop on Document Analysis Systems

1312 Accesses
1 Citations

Abstract

In this study, we outline computational issues in the design of a Digital Library (DL) for Indic languages. The complicated character structure of Indic scripts entails novel OCR analysis techniques and user interface (UI) designs. This paper describes a multi-tier software architecture, which provides text and image processing tools as independent, reusable entities. Techniques for measuring and evaluating different stages of an Indic script recognition engine are outlined.

Download to read the full chapter text

Chapter PDF

Readability Enhancement for Kannada Text Documents Through OCR

Recognize Meaningful Words and Idioms from the Images Based on OCR Tesseract Engine and NLTK

Reconstructing Scanned Documents for Full-Text Indexing to Empower Digital Library Services

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

The xml version of the tei guidelines. February 24 (2004), http://www.teic.org/P4X/CH.html
Allen, R.B., Schalow, J.: Metadata and data structures for the historical newspaper digital library. In: Proceedings of the 8th international conference on Information and knowledge management, pp. 147–153 (1999)
Google Scholar
Ashwin, T., Sastry, P.: A font and size independent ocr system for printed kannada documents using support vector machines. Sadhana 27, 35–58 (2002)
Article Google Scholar
Baird, H., Ho, T.K.: Large-scale simulation studies in image pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(10), 1067–1079 (1997)
Article Google Scholar
Bansal, V.: Integrating knowledge sources in devanagari text recognition. IEEE Transactions on Systems, Man and Cybernetics Part A 30(4), 500–505 (2000)
Article Google Scholar
Bazzi, I., Schwartz, R., Makhoul, J.: An omnifont open-vocabulary ocr system for english and arabic. IEEE Pattern Analysis and Machine Intelligence 21(6), 495–504 (1999)
Article Google Scholar
Bird, S., Day, D., Garofolo, J., Henderson, J., Laprun, C., Liberman, M.: Atlas: A flexible and extensible architecture for linguistic annotation. In: Proceedings of the Second International Language Resources and Evaluation Conference, pp. 1699–1706 (2000)
Google Scholar
Bray, T., Paoli, J., Sperberg-McQueen, C.M., Maler, E.: Extensible markup language (xml) 1.0, second edition (2001)
Google Scholar
Chaudhuri, B., Pal, U.: An ocr system to read two indian language scripts: Bangla and devanagari. In: Proceedings of the 4th International Conference on Document Analysis and Recognition, pp. 1011–1015 (1997)
Google Scholar
Chaudhuri, B., Pal, U., Mitra, M.: Automatic recognition of printed oriya script. In: Proceedings of the 6th International Conference on Document Analysis and Recognition, pp. 795–799 (2001)
Google Scholar
Consortium, U.: The Unicode Standard Version 4.0. Addison-Wesley, Reading (2003)
Google Scholar
Couasnon, B., Camillerapp, J., Leplumey, I.: Making handwritten archives documents accessible to public with a generic system of document image analysis. In: Proceedings of the 1st International Workshop on Document Image Analysis for Libraries (DIAL 2004), pp. 270–277 (2004)
Google Scholar
Daniels, P.T., Bright, W.: The World’s Writing Systems, March 1996. Oxford University Press, Oxford (1996)
Google Scholar
Govindaraju, V., Khedekar, S., Kompalli, S., Farooq, F., Setlur, S., Prasad, V.: Tools for enabling digital access to multilingual indic documents. In: Proceedings of the 1st International Workshop on Document Image Analysis for Libraries (DIAL 2004), pp. 122–133 (2004)
Google Scholar
Kompalli, S., Setlur, S., Govindaraju, V., Vemulapati, R.: Creation of data resources and design of an evaluation test bed for devanagari script recognition. In: Proceedings of the 13th International Workshop on Research Issues on Data Engineering: Multi-lingual Information Management, pp. 55–61 (2003)
Google Scholar
Lee, C., Kanungo, T.: The architecture of trueviz:a groundtruth/metadata editing and visualizing toolkit. PR 36(3), 811–825 (2003)
Google Scholar
Ma, H., Doermann, D.: Adaptive hindi ocr using generalized hausdorff image comparison. ACM Transactions on Asian Language Information Processing 26(2), 198–213 (2003)
Google Scholar
Mao, S., Kanungo, T.: Software architecture of pset: A page segmentation evaluation toolkit. International Journal on Document Analysis and Recognition (IJDAR) 4(3), 205–217 (2002)
Article Google Scholar
Microsoft, C.: Windows glyph processing, February 24 (2004), http://www.microsoft.com/typography/developers/opentype/default.htm
Negi, A., Bhagvati, C., Krishna, B.: An ocr system for telugu. In: Proceedings of the 6th International Conference on Document Analysis and Recognition, pp. 1110–1114 (2001)
Google Scholar
B. of Indian Standards. Indian script code for information interchange (1999)
Google Scholar
I. Sun Microsystems. Solaris 9 operating system features and benefits - compatibility, February 24 (2004), http://wwws.sun.com/software/solaris/sparc/solaris9_features_compatibility.html

Download references

Author information

Authors and Affiliations

CEDAR, UB Commons, 520 Lee Entrance, Suite 202, Amherst, NY, 14228, USA
Suryaprakash Kompalli, Srirangaraj Setlur & Venugopal Govindaraju

Authors

Suryaprakash Kompalli
View author publications
You can also search for this author in PubMed Google Scholar
Srirangaraj Setlur
View author publications
You can also search for this author in PubMed Google Scholar
Venugopal Govindaraju
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimento di Sistemi e Informatica, Università di Firenze, Via di Santa Marta 3, 50139, Firenze, Italy
Simone Marinai
Knowledge Management Department, German Research Center for Artificial Intelligence (DFKI) GmbH, Kaiserslautern, Germany
Andreas R. Dengel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kompalli, S., Setlur, S., Govindaraju, V. (2004). DL Architecture for Indic Scripts. In: Marinai, S., Dengel, A.R. (eds) Document Analysis Systems VI. DAS 2004. Lecture Notes in Computer Science, vol 3163. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28640-0_3

Download citation

DOI: https://doi.org/10.1007/978-3-540-28640-0_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23060-1
Online ISBN: 978-3-540-28640-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

DL Architecture for Indic Scripts

Abstract

Chapter PDF

Similar content being viewed by others

Readability Enhancement for Kannada Text Documents Through OCR

Recognize Meaningful Words and Idioms from the Images Based on OCR Tesseract Engine and NLTK

Reconstructing Scanned Documents for Full-Text Indexing to Empower Digital Library Services

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

DL Architecture for Indic Scripts

Abstract

Chapter PDF

Similar content being viewed by others

Readability Enhancement for Kannada Text Documents Through OCR

Recognize Meaningful Words and Idioms from the Images Based on OCR Tesseract Engine and NLTK

Reconstructing Scanned Documents for Full-Text Indexing to Empower Digital Library Services

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation