UTRNet: High-Resolution Urdu Text Recognition in Printed Documents

Rahman, Abdur; Ghosh, Arjun; Arora, Chetan

doi:10.1007/978-3-031-41734-4_19

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14191))

Included in the following conference series:

International Conference on Document Analysis and Recognition

657 Accesses

Abstract

In this paper, we propose a novel approach to address the challenges of printed Urdu text recognition using high-resolution, multi-scale semantic feature extraction. Our proposed UTRNet architecture, a hybrid CNN-RNN model, demonstrates state-of-the-art performance on benchmark datasets. To address the limitations of previous works, which struggle to generalize to the intricacies of the Urdu script and the lack of sufficient annotated real-world data, we have introduced the UTRSet-Real, a large-scale annotated real-world dataset comprising over 11,000 lines and UTRSet-Synth, a synthetic dataset with 20,000 lines closely resembling real-world and made corrections to the ground truth of the existing IIITH dataset, making it a more reliable resource for future research. We also provide UrduDoc, a benchmark dataset for Urdu text line detection in scanned documents. Additionally, we have developed an online tool for end-to-end Urdu OCR from printed documents by integrating UTRNet with a text detection model. Our work not only addresses the current limitations of Urdu OCR but also paves the way for future research in this area and facilitates the continued advancement of Urdu OCR technology. The project page with source code, datasets, annotations, trained models, and online tool is available at abdur75648.github.io/UTRNet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ahmed, S.B., Naz, S., Swati, S., Razzak, M.I.: Handwritten Urdu character recognition using 1-dimensional blstm classifier (2017). https://doi.org/10.48550/ARXIV.1705.05455
Akram, M.U., Hussain, S.: Word segmentation for Urdu ocr system (2010)
Google Scholar
Alghazo, J.M., Latif, G., Alzubaidi, L., Elhassan, A.: Multi-language handwritten digits recognition based on novel structural features. J. Imaging Sci. Technol. 63, 1–10 (2019)
Article Google Scholar
Ali, A., Pickering, M.: Urdu-text: a dataset and benchmark for Urdu text detection and recognition in natural scenes. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 323–328 (2019). https://doi.org/10.1109/ICDAR.2019.00059
Althobaiti, H., Lu, C.: A survey on Arabic optical character recognition and an isolated handwritten Arabic character recognition algorithm using encoded freeman chain code. In: 2017 51st Annual Conference on Information Sciences and Systems (CISS), pp. 1–6 (2017). https://doi.org/10.1109/CISS.2017.7926062
Anjum, T., Khan, N.: An attention based method for offline handwritten Urdu text recognition. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 169–174 (2020). https://doi.org/10.1109/ICFHR2020.2020.00040
Atienza, R.: Vision transformer for fast and efficient scene text recognition (2021). https://doi.org/10.48550/ARXIV.2105.08582. https://arxiv.org/abs/2105.08582
Baek, J., et al.: What is wrong with scene text recognition model comparisons? dataset and model analysis (2019). https://doi.org/10.48550/ARXIV.1904.01906
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2014). https://doi.org/10.48550/ARXIV.1409.0473
Bautista, D., Atienza, R.: Scene text recognition with permuted autoregressive sequence models (2022). https://doi.org/10.48550/ARXIV.2207.06966. https://arxiv.org/abs/2207.06966
Borisyuk, F., Gordo, A., Sivakumar, V.: Rosetta: large scale system for text detection and recognition in images. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM (2018). https://doi.org/10.1145/3219819.3219861
Butt, H., Raza, M.R., Ramzan, M., Ali, M.J., Haris, M.: Attention-based cnn-rnn Arabic text recognition from natural scene images. Forecasting 3, 520–540 (2021). https://doi.org/10.3390/forecast3030033
Article Google Scholar
Byeon, W., Liwicki, M., Breuel, T.M.: Texture classification using 2D LSTM networks. In: 2014 22nd International Conference on Pattern Recognition, pp. 1144–1149 (2014). https://doi.org/10.1109/ICPR.2014.206
Chammas, E., Mokbel, C.: Fine-tuning handwriting recognition systems with temporal dropout (2021). ArXiv abs/2102.00511 https://arxiv.org/abs/2102.00511
Chandio, A.A., Asikuzzaman, M., Pickering, M., Leghari, M.: Cursive-text: a comprehensive dataset for end-to-end Urdu text recognition in natural scene images. Data Brief 31, 105749 (2020). https://doi.org/10.1016/j.dib.2020.105749
Article Google Scholar
Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches (2014). https://doi.org/10.48550/ARXIV.1409.1259
Choudhary, P., Nain, N.: A four-tier annotated Urdu handwritten text image dataset for multidisciplinary research on Urdu script. ACM Trans. Asian Low Res. Lang. Inf. Process. 15(4), 1–23 (2016). https://doi.org/10.1145/2857053
Djaghbellou, S., Bouziane, A., Attia, A., Akhtar, Z.: A survey on Arabic handwritten script recognition systems. Int. J. Artif. Intell. Mach. Learn. 11, 1–17 (2021). https://doi.org/10.4018/IJAIML.20210701.oa9
Article Google Scholar
Fang, S., Xie, H., Wang, Y., Mao, Z., Zhang, Y.: Read like humans: autonomous, bidirectional and iterative language modeling for scene text recognition (2021). https://doi.org/10.48550/ARXIV.2103.06495. https://arxiv.org/abs/2103.06495
Fasha, M., Hammo, B.H., Obeid, N., Widian, J.: A hybrid deep learning model for Arabic text recognition (2020). ArXiv abs/2009.01987 https://arxiv.org/abs/2009.01987
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation (2013). https://doi.org/10.48550/ARXIV.1311.2524
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, p. 369–376 (2006). https://doi.org/10.1145/1143844.1143891
Graves, A., Schmidhuber, J.: Offline Arabic handwriting recognition with multidimensional recurrent neural networks, pp. 545–552 (2008)
Google Scholar
Han, K., et al.: A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 87–110 (2023). https://doi.org/10.1109/TPAMI.2022.3152247
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015). https://doi.org/10.48550/ARXIV.1512.03385
Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks (2016). https://doi.org/10.48550/ARXIV.1608.06993
Husnain, M., Saad Missen, M.M., Mumtaz, S., Coustaty, M., Luqman, M., Ogier, J.M.: Urdu handwritten text recognition: a survey. IET Image Process. 14(11), 2291–2300 (2020). https://doi.org/10.1049/iet-ipr.2019.0401
Article Google Scholar
Hussain, S.: A survey of ocr in Arabic language: applications, techniques, and challenges. Appl. Sci. 13, 27 (2023). https://doi.org/10.3390/app13074584
Article Google Scholar
Jain, M., Mathew, M., Jawahar, C.V.: Unconstrained scene text and video text recognition for Arabic script. In: 2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR), pp. 26–30 (2017)
Google Scholar
Jain, M., Mathew, M., Jawahar, C.: Unconstrained ocr for Urdu using deep cnn-rnn hybrid networks. In: 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR), pp. 747–752. IEEE (2017)
Google Scholar
Kashif, M.: Urdu handwritten text recognition using resnet18 (2021). https://doi.org/10.48550/ARXIV.2103.05105
Kassem, A.M., et al.: Ocformer: a transformer-based model for Arabic handwritten text recognition. In: 2021 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC), pp. 182–186 (2021)
Google Scholar
Khan, K., Ullah, R., Ahmad, N., Naveed, K.: Urdu character recognition using principal component analysis. Int. J. Comput. Appl. 60, 1–4 (2012). https://doi.org/10.5120/9733-2082
Article Google Scholar
Khan, N.H., Adnan, A.: Urdu optical character recognition systems: present contributions and future directions. IEEE Access 6, 46019–46046 (2018). https://doi.org/10.1109/ACCESS.2018.2865532
Article Google Scholar
Khan, N.H., Adnan, A., Basar, S.: An analysis of off-line and on-line approaches in Urdu character recognition. In: 2016 15th International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases (AIKED 2016) (2016)
Google Scholar
Ko, D., Lee, C., Han, D., Ohk, H., Kang, K., Han, S.: Approach for machine-printed Arabic character recognition: the-state-of-the-art deep-learning method. Electron. Imaging 2018, 176-1–176-8 (2018)
Article Google Scholar
Kolesnikov, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale (2021)
Google Scholar
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791
Article Google Scholar
Lee, C.Y., Osindero, S.: Recursive recurrent nets with attention modeling for ocr in the wild (2016). https://doi.org/10.48550/ARXIV.1603.03101. https://arxiv.org/abs/1603.03101
Li, M., et al.: Trocr: transformer-based optical character recognition with pre-trained models (2021). https://doi.org/10.48550/ARXIV.2109.10282. https://arxiv.org/abs/2109.10282
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, W., Chen, C., Wong, K.Y., Su, Z., Han, J.: Star-net: A spatial attention residue network for scene text recognition, pp. 43.1–43.13 (2016). https://doi.org/10.5244/C.30.43
Mushtaq, F., Misgar, M.M., Kumar, M., Khurana, S.S.: UrduDeepNet: offline handwritten Urdu character recognition using deep neural network. Neural Comput. Appl. 33(22), 15229–15252 (2021)
Article Google Scholar
Naz, S., Ahmed, S., Ahmad, R., Razzak, M.: Zoning features and 2dlstm for Urdu text-line recognition. Procedia Comput. Sci. 96, 16–22 (2016). https://doi.org/10.1016/j.procs.2016.08.084
Article Google Scholar
Naz, S., et al.: Urdu nastaliq recognition using convolutional-recursive deep learning. Neurocomputing 243, 80–87 (2017). https://doi.org/10.1016/j.neucom.2017.02.081. https://www.sciencedirect.com/science/article/pii/S0925231217304654
Oktay, O., et al.: Attention u-net: learning where to look for the pancreas (2018). https://doi.org/10.48550/ARXIV.1804.03999
Pal, U., Sarkar, A.: Recognition of printed Urdu script. In: Seventh International Conference on Document Analysis and Recognition, 2003, Proceedings, pp. 1183–1187 (2003). https://doi.org/10.1109/ICDAR.2003.1227844
Punn, N.S., Agarwal, S.: Inception u-net architecture for semantic segmentation to identify nuclei in microscopy cell images. ACM Trans. Multimedia Comput. Commun. Appl. 16(1), 1–15 (2020). https://doi.org/10.1145/3376922
Article Google Scholar
Rashid, S.F., Schambach, M.P., Rottland, J., Nüll, S.: Low resolution Arabic recognition with multidimensional recurrent neural networks (2013). https://doi.org/10.1145/2505377.2505385
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Sabbour, N., Shafait, F.: A segmentation free approach to Arabic and Urdu ocr. In: Proceedings of SPIE - The International Society for Optical Engineering, vol. 8658 (2013). https://doi.org/10.1117/12.2003731
Sagheer, M.W., He, C.L., Nobile, N., Suen, C.Y.: A new large Urdu database for off-line handwriting recognition. In: Foggia, P., Sansone, C., Vento, M. (eds.) ICIAP 2009. LNCS, vol. 5716, pp. 538–546. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04146-4_58
Chapter Google Scholar
Sardar, S., Wahab, A.: Optical character recognition system for Urdu. In: 2010 International Conference on Information and Emerging Technologies, pp. 1–5 (2010)
Google Scholar
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)
Article Google Scholar
Semary, N., Rashad, M.: Isolated printed Arabic character recognition using knn and random forest tree classifiers, vol. 488, p. 11 (2014)
Google Scholar
Shahin, A.: Printed Arabic text recognition using linear and nonlinear regression. Int. J. Adv. Comput. Sci. Appl. 8 (2017). https://doi.org/10.14569/IJACSA.2017.080129
Shaiq, M.D., Cheema, M.D.A., Kamal, A.: Transformer based Urdu handwritten text optical character reader (2022). https://doi.org/10.48550/ARXIV.2206.04575. https://arxiv.org/abs/2206.04575
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition (2015). https://doi.org/10.48550/ARXIV.1507.05717
Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification (2016). https://doi.org/10.48550/ARXIV.1603.03915. https://arxiv.org/abs/1603.03915
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). https://doi.org/10.48550/ARXIV.1409.1556
Singh, A., Bacchuwar, K., Bhasin, A.: A survey of OCR applications. Int. J. Mach. Learn. Comput. (IJMLC) 2, 314 (2012). https://doi.org/10.7763/IJMLC.2012.V2.137
Article Google Scholar
Sobhi, M., Hifny, Y., Elkaffas, S.M.: Arabic optical character recognition using attention based encoder-decoder architecture. In: 2020 2nd International Conference on Artificial Intelligence, Robotics and Control, AIRC 2020, pp. 1–5. Association for Computing Machinery, New York (2021). https://doi.org/10.1145/3448326.3448327
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks (2014). https://doi.org/10.48550/ARXIV.1409.3215. https://arxiv.org/abs/1409.3215
Tabassam, N., Naqvi, S., Rehman, H., Anoshia, F.: Optical character recognition system for Urdu (Naskh font) using pattern matching technique. Int. J. Image Process. 3, 92 (2009)
Google Scholar
Wang, J., Hu, X.: Gated recurrent convolution neural network for ocr. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS 2017, pp. 334–343. Curran Associates Inc., Red Hook (2017)
Google Scholar
Wang, J., et al.: Deep high-resolution representation learning for visual recognition (2019)
Google Scholar
Wang, W., et al.: Shape robust text detection with progressive scale expansion network (2019). https://doi.org/10.48550/ARXIV.1903.12473
Wang, Y., Xie, H., Fang, S., Wang, J., Zhu, S., Zhang, Y.: From two to one: a new scene text recognizer with visual language modeling network (2021). https://doi.org/10.48550/ARXIV.2108.09661. https://arxiv.org/abs/2108.09661
Wang, Y., Xie, H., Zha, Z., Xing, M., Fu, Z., Zhang, Y.: Contournet: taking a further step toward accurate arbitrary-shaped scene text detection (2020). https://doi.org/10.48550/ARXIV.2004.04940
Yuan, L., et al.: Tokens-to-token vit: training vision transformers from scratch on imagenet (2021). https://doi.org/10.48550/ARXIV.2101.11986. https://arxiv.org/abs/2101.11986
Zeiler, M.D.: Adadelta: an adaptive learning rate method (2012). https://doi.org/10.48550/ARXIV.1212.5701
Zhang, S.X., et al.: Deep relational reasoning graph network for arbitrary shape text detection (2020). https://doi.org/10.48550/ARXIV.2003.07493
Zhang, Z., Liu, Q., Wang, Y.: Road extraction by deep residual u-net. IEEE Geosci. Remote Sens. Lett. 15(5), 749–753 (2018). https://doi.org/10.1109/lgrs.2018.2802944
Zheng, T., Chen, Z., Fang, S., Xie, H., Jiang, Y.G.: Cdistnet: perceiving multi-domain character distance for robust text recognition (2021). https://doi.org/10.48550/ARXIV.2111.11011. https://arxiv.org/abs/2111.11011
Zhou, X., et al.: East: an efficient and accurate scene text detector (2017). https://doi.org/10.48550/ARXIV.1704.03155
Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N., Liang, J.: Unet++: a nested u-net architecture for medical image segmentation (2018). https://doi.org/10.48550/ARXIV.1807.10165
Zoizou, A., Zarghili, A., Chaker, I.: A new hybrid method for Arabic multi-font text segmentation, and a reference corpus construction. J. King Saud Univ. Comput. Inf. Sci. 32, 576–582 (2020)
Google Scholar

Download references

Acknowledgement

We would like to express our gratitude to the Rekhta Foundation and Arjumand Ara for providing us with scanned images, as well as Noor Fatima and Mohammad Usman for their valuable annotations of the UTRSet-Real dataset. Furthermore, we acknowledge the support of a grant from IRD, IIT Delhi, and MEITY, Government of India, through the NLTM-Bhashini project.

Author information

Authors and Affiliations

Indian Institute of Technology Delhi, Delhi, India
Abdur Rahman, Arjun Ghosh & Chetan Arora

Authors

Abdur Rahman
View author publications
You can also search for this author in PubMed Google Scholar
Arjun Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Chetan Arora
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdur Rahman .

Editor information

Editors and Affiliations

TU Dortmund University, Dortmund, Germany
Gernot A. Fink
Adobe, College Park, MN, USA
Rajiv Jain
Osaka Metropolitan University, Osaka, Japan
Koichi Kise
Rochester Institute of Technology, Rochester, NY, USA
Richard Zanibbi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rahman, A., Ghosh, A., Arora, C. (2023). UTRNet: High-Resolution Urdu Text Recognition in Printed Documents. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14191. Springer, Cham. https://doi.org/10.1007/978-3-031-41734-4_19

Download citation

DOI: https://doi.org/10.1007/978-3-031-41734-4_19
Published: 19 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41733-7
Online ISBN: 978-3-031-41734-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

UTRNet: High-Resolution Urdu Text Recognition in Printed Documents