Skip to main content

UTRNet: High-Resolution Urdu Text Recognition in Printed Documents

  • Conference paper
  • First Online:
Document Analysis and Recognition - ICDAR 2023 (ICDAR 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14191))

Included in the following conference series:

  • 657 Accesses

Abstract

In this paper, we propose a novel approach to address the challenges of printed Urdu text recognition using high-resolution, multi-scale semantic feature extraction. Our proposed UTRNet architecture, a hybrid CNN-RNN model, demonstrates state-of-the-art performance on benchmark datasets. To address the limitations of previous works, which struggle to generalize to the intricacies of the Urdu script and the lack of sufficient annotated real-world data, we have introduced the UTRSet-Real, a large-scale annotated real-world dataset comprising over 11,000 lines and UTRSet-Synth, a synthetic dataset with 20,000 lines closely resembling real-world and made corrections to the ground truth of the existing IIITH dataset, making it a more reliable resource for future research. We also provide UrduDoc, a benchmark dataset for Urdu text line detection in scanned documents. Additionally, we have developed an online tool for end-to-end Urdu OCR from printed documents by integrating UTRNet with a text detection model. Our work not only addresses the current limitations of Urdu OCR but also paves the way for future research in this area and facilitates the continued advancement of Urdu OCR technology. The project page with source code, datasets, annotations, trained models, and online tool is available at abdur75648.github.io/UTRNet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ahmed, S.B., Naz, S., Swati, S., Razzak, M.I.: Handwritten Urdu character recognition using 1-dimensional blstm classifier (2017). https://doi.org/10.48550/ARXIV.1705.05455

  2. Akram, M.U., Hussain, S.: Word segmentation for Urdu ocr system (2010)

    Google Scholar 

  3. Alghazo, J.M., Latif, G., Alzubaidi, L., Elhassan, A.: Multi-language handwritten digits recognition based on novel structural features. J. Imaging Sci. Technol. 63, 1–10 (2019)

    Article  Google Scholar 

  4. Ali, A., Pickering, M.: Urdu-text: a dataset and benchmark for Urdu text detection and recognition in natural scenes. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 323–328 (2019). https://doi.org/10.1109/ICDAR.2019.00059

  5. Althobaiti, H., Lu, C.: A survey on Arabic optical character recognition and an isolated handwritten Arabic character recognition algorithm using encoded freeman chain code. In: 2017 51st Annual Conference on Information Sciences and Systems (CISS), pp. 1–6 (2017). https://doi.org/10.1109/CISS.2017.7926062

  6. Anjum, T., Khan, N.: An attention based method for offline handwritten Urdu text recognition. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 169–174 (2020). https://doi.org/10.1109/ICFHR2020.2020.00040

  7. Atienza, R.: Vision transformer for fast and efficient scene text recognition (2021). https://doi.org/10.48550/ARXIV.2105.08582. https://arxiv.org/abs/2105.08582

  8. Baek, J., et al.: What is wrong with scene text recognition model comparisons? dataset and model analysis (2019). https://doi.org/10.48550/ARXIV.1904.01906

  9. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2014). https://doi.org/10.48550/ARXIV.1409.0473

  10. Bautista, D., Atienza, R.: Scene text recognition with permuted autoregressive sequence models (2022). https://doi.org/10.48550/ARXIV.2207.06966. https://arxiv.org/abs/2207.06966

  11. Borisyuk, F., Gordo, A., Sivakumar, V.: Rosetta: large scale system for text detection and recognition in images. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM (2018). https://doi.org/10.1145/3219819.3219861

  12. Butt, H., Raza, M.R., Ramzan, M., Ali, M.J., Haris, M.: Attention-based cnn-rnn Arabic text recognition from natural scene images. Forecasting 3, 520–540 (2021). https://doi.org/10.3390/forecast3030033

    Article  Google Scholar 

  13. Byeon, W., Liwicki, M., Breuel, T.M.: Texture classification using 2D LSTM networks. In: 2014 22nd International Conference on Pattern Recognition, pp. 1144–1149 (2014). https://doi.org/10.1109/ICPR.2014.206

  14. Chammas, E., Mokbel, C.: Fine-tuning handwriting recognition systems with temporal dropout (2021). ArXiv abs/2102.00511 https://arxiv.org/abs/2102.00511

  15. Chandio, A.A., Asikuzzaman, M., Pickering, M., Leghari, M.: Cursive-text: a comprehensive dataset for end-to-end Urdu text recognition in natural scene images. Data Brief 31, 105749 (2020). https://doi.org/10.1016/j.dib.2020.105749

    Article  Google Scholar 

  16. Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches (2014). https://doi.org/10.48550/ARXIV.1409.1259

  17. Choudhary, P., Nain, N.: A four-tier annotated Urdu handwritten text image dataset for multidisciplinary research on Urdu script. ACM Trans. Asian Low Res. Lang. Inf. Process. 15(4), 1–23 (2016). https://doi.org/10.1145/2857053

  18. Djaghbellou, S., Bouziane, A., Attia, A., Akhtar, Z.: A survey on Arabic handwritten script recognition systems. Int. J. Artif. Intell. Mach. Learn. 11, 1–17 (2021). https://doi.org/10.4018/IJAIML.20210701.oa9

    Article  Google Scholar 

  19. Fang, S., Xie, H., Wang, Y., Mao, Z., Zhang, Y.: Read like humans: autonomous, bidirectional and iterative language modeling for scene text recognition (2021). https://doi.org/10.48550/ARXIV.2103.06495. https://arxiv.org/abs/2103.06495

  20. Fasha, M., Hammo, B.H., Obeid, N., Widian, J.: A hybrid deep learning model for Arabic text recognition (2020). ArXiv abs/2009.01987 https://arxiv.org/abs/2009.01987

  21. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation (2013). https://doi.org/10.48550/ARXIV.1311.2524

  22. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, p. 369–376 (2006). https://doi.org/10.1145/1143844.1143891

  23. Graves, A., Schmidhuber, J.: Offline Arabic handwriting recognition with multidimensional recurrent neural networks, pp. 545–552 (2008)

    Google Scholar 

  24. Han, K., et al.: A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 87–110 (2023). https://doi.org/10.1109/TPAMI.2022.3152247

    Article  Google Scholar 

  25. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015). https://doi.org/10.48550/ARXIV.1512.03385

  26. Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks (2016). https://doi.org/10.48550/ARXIV.1608.06993

  27. Husnain, M., Saad Missen, M.M., Mumtaz, S., Coustaty, M., Luqman, M., Ogier, J.M.: Urdu handwritten text recognition: a survey. IET Image Process. 14(11), 2291–2300 (2020). https://doi.org/10.1049/iet-ipr.2019.0401

    Article  Google Scholar 

  28. Hussain, S.: A survey of ocr in Arabic language: applications, techniques, and challenges. Appl. Sci. 13, 27 (2023). https://doi.org/10.3390/app13074584

    Article  Google Scholar 

  29. Jain, M., Mathew, M., Jawahar, C.V.: Unconstrained scene text and video text recognition for Arabic script. In: 2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR), pp. 26–30 (2017)

    Google Scholar 

  30. Jain, M., Mathew, M., Jawahar, C.: Unconstrained ocr for Urdu using deep cnn-rnn hybrid networks. In: 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR), pp. 747–752. IEEE (2017)

    Google Scholar 

  31. Kashif, M.: Urdu handwritten text recognition using resnet18 (2021). https://doi.org/10.48550/ARXIV.2103.05105

  32. Kassem, A.M., et al.: Ocformer: a transformer-based model for Arabic handwritten text recognition. In: 2021 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC), pp. 182–186 (2021)

    Google Scholar 

  33. Khan, K., Ullah, R., Ahmad, N., Naveed, K.: Urdu character recognition using principal component analysis. Int. J. Comput. Appl. 60, 1–4 (2012). https://doi.org/10.5120/9733-2082

    Article  Google Scholar 

  34. Khan, N.H., Adnan, A.: Urdu optical character recognition systems: present contributions and future directions. IEEE Access 6, 46019–46046 (2018). https://doi.org/10.1109/ACCESS.2018.2865532

    Article  Google Scholar 

  35. Khan, N.H., Adnan, A., Basar, S.: An analysis of off-line and on-line approaches in Urdu character recognition. In: 2016 15th International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases (AIKED 2016) (2016)

    Google Scholar 

  36. Ko, D., Lee, C., Han, D., Ohk, H., Kang, K., Han, S.: Approach for machine-printed Arabic character recognition: the-state-of-the-art deep-learning method. Electron. Imaging 2018, 176-1–176-8 (2018)

    Article  Google Scholar 

  37. Kolesnikov, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale (2021)

    Google Scholar 

  38. Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791

    Article  Google Scholar 

  39. Lee, C.Y., Osindero, S.: Recursive recurrent nets with attention modeling for ocr in the wild (2016). https://doi.org/10.48550/ARXIV.1603.03101. https://arxiv.org/abs/1603.03101

  40. Li, M., et al.: Trocr: transformer-based optical character recognition with pre-trained models (2021). https://doi.org/10.48550/ARXIV.2109.10282. https://arxiv.org/abs/2109.10282

  41. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  42. Liu, W., Chen, C., Wong, K.Y., Su, Z., Han, J.: Star-net: A spatial attention residue network for scene text recognition, pp. 43.1–43.13 (2016). https://doi.org/10.5244/C.30.43

  43. Mushtaq, F., Misgar, M.M., Kumar, M., Khurana, S.S.: UrduDeepNet: offline handwritten Urdu character recognition using deep neural network. Neural Comput. Appl. 33(22), 15229–15252 (2021)

    Article  Google Scholar 

  44. Naz, S., Ahmed, S., Ahmad, R., Razzak, M.: Zoning features and 2dlstm for Urdu text-line recognition. Procedia Comput. Sci. 96, 16–22 (2016). https://doi.org/10.1016/j.procs.2016.08.084

    Article  Google Scholar 

  45. Naz, S., et al.: Urdu nastaliq recognition using convolutional-recursive deep learning. Neurocomputing 243, 80–87 (2017). https://doi.org/10.1016/j.neucom.2017.02.081. https://www.sciencedirect.com/science/article/pii/S0925231217304654

  46. Oktay, O., et al.: Attention u-net: learning where to look for the pancreas (2018). https://doi.org/10.48550/ARXIV.1804.03999

  47. Pal, U., Sarkar, A.: Recognition of printed Urdu script. In: Seventh International Conference on Document Analysis and Recognition, 2003, Proceedings, pp. 1183–1187 (2003). https://doi.org/10.1109/ICDAR.2003.1227844

  48. Punn, N.S., Agarwal, S.: Inception u-net architecture for semantic segmentation to identify nuclei in microscopy cell images. ACM Trans. Multimedia Comput. Commun. Appl. 16(1), 1–15 (2020). https://doi.org/10.1145/3376922

    Article  Google Scholar 

  49. Rashid, S.F., Schambach, M.P., Rottland, J., Nüll, S.: Low resolution Arabic recognition with multidimensional recurrent neural networks (2013). https://doi.org/10.1145/2505377.2505385

  50. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  51. Sabbour, N., Shafait, F.: A segmentation free approach to Arabic and Urdu ocr. In: Proceedings of SPIE - The International Society for Optical Engineering, vol. 8658 (2013). https://doi.org/10.1117/12.2003731

  52. Sagheer, M.W., He, C.L., Nobile, N., Suen, C.Y.: A new large Urdu database for off-line handwriting recognition. In: Foggia, P., Sansone, C., Vento, M. (eds.) ICIAP 2009. LNCS, vol. 5716, pp. 538–546. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04146-4_58

    Chapter  Google Scholar 

  53. Sardar, S., Wahab, A.: Optical character recognition system for Urdu. In: 2010 International Conference on Information and Emerging Technologies, pp. 1–5 (2010)

    Google Scholar 

  54. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)

    Article  Google Scholar 

  55. Semary, N., Rashad, M.: Isolated printed Arabic character recognition using knn and random forest tree classifiers, vol. 488, p. 11 (2014)

    Google Scholar 

  56. Shahin, A.: Printed Arabic text recognition using linear and nonlinear regression. Int. J. Adv. Comput. Sci. Appl. 8 (2017). https://doi.org/10.14569/IJACSA.2017.080129

  57. Shaiq, M.D., Cheema, M.D.A., Kamal, A.: Transformer based Urdu handwritten text optical character reader (2022). https://doi.org/10.48550/ARXIV.2206.04575. https://arxiv.org/abs/2206.04575

  58. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition (2015). https://doi.org/10.48550/ARXIV.1507.05717

  59. Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification (2016). https://doi.org/10.48550/ARXIV.1603.03915. https://arxiv.org/abs/1603.03915

  60. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). https://doi.org/10.48550/ARXIV.1409.1556

  61. Singh, A., Bacchuwar, K., Bhasin, A.: A survey of OCR applications. Int. J. Mach. Learn. Comput. (IJMLC) 2, 314 (2012). https://doi.org/10.7763/IJMLC.2012.V2.137

    Article  Google Scholar 

  62. Sobhi, M., Hifny, Y., Elkaffas, S.M.: Arabic optical character recognition using attention based encoder-decoder architecture. In: 2020 2nd International Conference on Artificial Intelligence, Robotics and Control, AIRC 2020, pp. 1–5. Association for Computing Machinery, New York (2021). https://doi.org/10.1145/3448326.3448327

  63. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks (2014). https://doi.org/10.48550/ARXIV.1409.3215. https://arxiv.org/abs/1409.3215

  64. Tabassam, N., Naqvi, S., Rehman, H., Anoshia, F.: Optical character recognition system for Urdu (Naskh font) using pattern matching technique. Int. J. Image Process. 3, 92 (2009)

    Google Scholar 

  65. Wang, J., Hu, X.: Gated recurrent convolution neural network for ocr. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS 2017, pp. 334–343. Curran Associates Inc., Red Hook (2017)

    Google Scholar 

  66. Wang, J., et al.: Deep high-resolution representation learning for visual recognition (2019)

    Google Scholar 

  67. Wang, W., et al.: Shape robust text detection with progressive scale expansion network (2019). https://doi.org/10.48550/ARXIV.1903.12473

  68. Wang, Y., Xie, H., Fang, S., Wang, J., Zhu, S., Zhang, Y.: From two to one: a new scene text recognizer with visual language modeling network (2021). https://doi.org/10.48550/ARXIV.2108.09661. https://arxiv.org/abs/2108.09661

  69. Wang, Y., Xie, H., Zha, Z., Xing, M., Fu, Z., Zhang, Y.: Contournet: taking a further step toward accurate arbitrary-shaped scene text detection (2020). https://doi.org/10.48550/ARXIV.2004.04940

  70. Yuan, L., et al.: Tokens-to-token vit: training vision transformers from scratch on imagenet (2021). https://doi.org/10.48550/ARXIV.2101.11986. https://arxiv.org/abs/2101.11986

  71. Zeiler, M.D.: Adadelta: an adaptive learning rate method (2012). https://doi.org/10.48550/ARXIV.1212.5701

  72. Zhang, S.X., et al.: Deep relational reasoning graph network for arbitrary shape text detection (2020). https://doi.org/10.48550/ARXIV.2003.07493

  73. Zhang, Z., Liu, Q., Wang, Y.: Road extraction by deep residual u-net. IEEE Geosci. Remote Sens. Lett. 15(5), 749–753 (2018). https://doi.org/10.1109/lgrs.2018.2802944

  74. Zheng, T., Chen, Z., Fang, S., Xie, H., Jiang, Y.G.: Cdistnet: perceiving multi-domain character distance for robust text recognition (2021). https://doi.org/10.48550/ARXIV.2111.11011. https://arxiv.org/abs/2111.11011

  75. Zhou, X., et al.: East: an efficient and accurate scene text detector (2017). https://doi.org/10.48550/ARXIV.1704.03155

  76. Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N., Liang, J.: Unet++: a nested u-net architecture for medical image segmentation (2018). https://doi.org/10.48550/ARXIV.1807.10165

  77. Zoizou, A., Zarghili, A., Chaker, I.: A new hybrid method for Arabic multi-font text segmentation, and a reference corpus construction. J. King Saud Univ. Comput. Inf. Sci. 32, 576–582 (2020)

    Google Scholar 

Download references

Acknowledgement

We would like to express our gratitude to the Rekhta Foundation and Arjumand Ara for providing us with scanned images, as well as Noor Fatima and Mohammad Usman for their valuable annotations of the UTRSet-Real dataset. Furthermore, we acknowledge the support of a grant from IRD, IIT Delhi, and MEITY, Government of India, through the NLTM-Bhashini project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdur Rahman .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rahman, A., Ghosh, A., Arora, C. (2023). UTRNet: High-Resolution Urdu Text Recognition in Printed Documents. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14191. Springer, Cham. https://doi.org/10.1007/978-3-031-41734-4_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-41734-4_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-41733-7

  • Online ISBN: 978-3-031-41734-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics