Abstract
In this paper, we propose a novel approach to address the challenges of printed Urdu text recognition using high-resolution, multi-scale semantic feature extraction. Our proposed UTRNet architecture, a hybrid CNN-RNN model, demonstrates state-of-the-art performance on benchmark datasets. To address the limitations of previous works, which struggle to generalize to the intricacies of the Urdu script and the lack of sufficient annotated real-world data, we have introduced the UTRSet-Real, a large-scale annotated real-world dataset comprising over 11,000 lines and UTRSet-Synth, a synthetic dataset with 20,000 lines closely resembling real-world and made corrections to the ground truth of the existing IIITH dataset, making it a more reliable resource for future research. We also provide UrduDoc, a benchmark dataset for Urdu text line detection in scanned documents. Additionally, we have developed an online tool for end-to-end Urdu OCR from printed documents by integrating UTRNet with a text detection model. Our work not only addresses the current limitations of Urdu OCR but also paves the way for future research in this area and facilitates the continued advancement of Urdu OCR technology. The project page with source code, datasets, annotations, trained models, and online tool is available at abdur75648.github.io/UTRNet.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ahmed, S.B., Naz, S., Swati, S., Razzak, M.I.: Handwritten Urdu character recognition using 1-dimensional blstm classifier (2017). https://doi.org/10.48550/ARXIV.1705.05455
Akram, M.U., Hussain, S.: Word segmentation for Urdu ocr system (2010)
Alghazo, J.M., Latif, G., Alzubaidi, L., Elhassan, A.: Multi-language handwritten digits recognition based on novel structural features. J. Imaging Sci. Technol. 63, 1–10 (2019)
Ali, A., Pickering, M.: Urdu-text: a dataset and benchmark for Urdu text detection and recognition in natural scenes. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 323–328 (2019). https://doi.org/10.1109/ICDAR.2019.00059
Althobaiti, H., Lu, C.: A survey on Arabic optical character recognition and an isolated handwritten Arabic character recognition algorithm using encoded freeman chain code. In: 2017 51st Annual Conference on Information Sciences and Systems (CISS), pp. 1–6 (2017). https://doi.org/10.1109/CISS.2017.7926062
Anjum, T., Khan, N.: An attention based method for offline handwritten Urdu text recognition. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 169–174 (2020). https://doi.org/10.1109/ICFHR2020.2020.00040
Atienza, R.: Vision transformer for fast and efficient scene text recognition (2021). https://doi.org/10.48550/ARXIV.2105.08582. https://arxiv.org/abs/2105.08582
Baek, J., et al.: What is wrong with scene text recognition model comparisons? dataset and model analysis (2019). https://doi.org/10.48550/ARXIV.1904.01906
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2014). https://doi.org/10.48550/ARXIV.1409.0473
Bautista, D., Atienza, R.: Scene text recognition with permuted autoregressive sequence models (2022). https://doi.org/10.48550/ARXIV.2207.06966. https://arxiv.org/abs/2207.06966
Borisyuk, F., Gordo, A., Sivakumar, V.: Rosetta: large scale system for text detection and recognition in images. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM (2018). https://doi.org/10.1145/3219819.3219861
Butt, H., Raza, M.R., Ramzan, M., Ali, M.J., Haris, M.: Attention-based cnn-rnn Arabic text recognition from natural scene images. Forecasting 3, 520–540 (2021). https://doi.org/10.3390/forecast3030033
Byeon, W., Liwicki, M., Breuel, T.M.: Texture classification using 2D LSTM networks. In: 2014 22nd International Conference on Pattern Recognition, pp. 1144–1149 (2014). https://doi.org/10.1109/ICPR.2014.206
Chammas, E., Mokbel, C.: Fine-tuning handwriting recognition systems with temporal dropout (2021). ArXiv abs/2102.00511 https://arxiv.org/abs/2102.00511
Chandio, A.A., Asikuzzaman, M., Pickering, M., Leghari, M.: Cursive-text: a comprehensive dataset for end-to-end Urdu text recognition in natural scene images. Data Brief 31, 105749 (2020). https://doi.org/10.1016/j.dib.2020.105749
Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches (2014). https://doi.org/10.48550/ARXIV.1409.1259
Choudhary, P., Nain, N.: A four-tier annotated Urdu handwritten text image dataset for multidisciplinary research on Urdu script. ACM Trans. Asian Low Res. Lang. Inf. Process. 15(4), 1–23 (2016). https://doi.org/10.1145/2857053
Djaghbellou, S., Bouziane, A., Attia, A., Akhtar, Z.: A survey on Arabic handwritten script recognition systems. Int. J. Artif. Intell. Mach. Learn. 11, 1–17 (2021). https://doi.org/10.4018/IJAIML.20210701.oa9
Fang, S., Xie, H., Wang, Y., Mao, Z., Zhang, Y.: Read like humans: autonomous, bidirectional and iterative language modeling for scene text recognition (2021). https://doi.org/10.48550/ARXIV.2103.06495. https://arxiv.org/abs/2103.06495
Fasha, M., Hammo, B.H., Obeid, N., Widian, J.: A hybrid deep learning model for Arabic text recognition (2020). ArXiv abs/2009.01987 https://arxiv.org/abs/2009.01987
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation (2013). https://doi.org/10.48550/ARXIV.1311.2524
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, p. 369–376 (2006). https://doi.org/10.1145/1143844.1143891
Graves, A., Schmidhuber, J.: Offline Arabic handwriting recognition with multidimensional recurrent neural networks, pp. 545–552 (2008)
Han, K., et al.: A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 87–110 (2023). https://doi.org/10.1109/TPAMI.2022.3152247
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015). https://doi.org/10.48550/ARXIV.1512.03385
Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks (2016). https://doi.org/10.48550/ARXIV.1608.06993
Husnain, M., Saad Missen, M.M., Mumtaz, S., Coustaty, M., Luqman, M., Ogier, J.M.: Urdu handwritten text recognition: a survey. IET Image Process. 14(11), 2291–2300 (2020). https://doi.org/10.1049/iet-ipr.2019.0401
Hussain, S.: A survey of ocr in Arabic language: applications, techniques, and challenges. Appl. Sci. 13, 27 (2023). https://doi.org/10.3390/app13074584
Jain, M., Mathew, M., Jawahar, C.V.: Unconstrained scene text and video text recognition for Arabic script. In: 2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR), pp. 26–30 (2017)
Jain, M., Mathew, M., Jawahar, C.: Unconstrained ocr for Urdu using deep cnn-rnn hybrid networks. In: 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR), pp. 747–752. IEEE (2017)
Kashif, M.: Urdu handwritten text recognition using resnet18 (2021). https://doi.org/10.48550/ARXIV.2103.05105
Kassem, A.M., et al.: Ocformer: a transformer-based model for Arabic handwritten text recognition. In: 2021 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC), pp. 182–186 (2021)
Khan, K., Ullah, R., Ahmad, N., Naveed, K.: Urdu character recognition using principal component analysis. Int. J. Comput. Appl. 60, 1–4 (2012). https://doi.org/10.5120/9733-2082
Khan, N.H., Adnan, A.: Urdu optical character recognition systems: present contributions and future directions. IEEE Access 6, 46019–46046 (2018). https://doi.org/10.1109/ACCESS.2018.2865532
Khan, N.H., Adnan, A., Basar, S.: An analysis of off-line and on-line approaches in Urdu character recognition. In: 2016 15th International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases (AIKED 2016) (2016)
Ko, D., Lee, C., Han, D., Ohk, H., Kang, K., Han, S.: Approach for machine-printed Arabic character recognition: the-state-of-the-art deep-learning method. Electron. Imaging 2018, 176-1–176-8 (2018)
Kolesnikov, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale (2021)
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791
Lee, C.Y., Osindero, S.: Recursive recurrent nets with attention modeling for ocr in the wild (2016). https://doi.org/10.48550/ARXIV.1603.03101. https://arxiv.org/abs/1603.03101
Li, M., et al.: Trocr: transformer-based optical character recognition with pre-trained models (2021). https://doi.org/10.48550/ARXIV.2109.10282. https://arxiv.org/abs/2109.10282
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Liu, W., Chen, C., Wong, K.Y., Su, Z., Han, J.: Star-net: A spatial attention residue network for scene text recognition, pp. 43.1–43.13 (2016). https://doi.org/10.5244/C.30.43
Mushtaq, F., Misgar, M.M., Kumar, M., Khurana, S.S.: UrduDeepNet: offline handwritten Urdu character recognition using deep neural network. Neural Comput. Appl. 33(22), 15229–15252 (2021)
Naz, S., Ahmed, S., Ahmad, R., Razzak, M.: Zoning features and 2dlstm for Urdu text-line recognition. Procedia Comput. Sci. 96, 16–22 (2016). https://doi.org/10.1016/j.procs.2016.08.084
Naz, S., et al.: Urdu nastaliq recognition using convolutional-recursive deep learning. Neurocomputing 243, 80–87 (2017). https://doi.org/10.1016/j.neucom.2017.02.081. https://www.sciencedirect.com/science/article/pii/S0925231217304654
Oktay, O., et al.: Attention u-net: learning where to look for the pancreas (2018). https://doi.org/10.48550/ARXIV.1804.03999
Pal, U., Sarkar, A.: Recognition of printed Urdu script. In: Seventh International Conference on Document Analysis and Recognition, 2003, Proceedings, pp. 1183–1187 (2003). https://doi.org/10.1109/ICDAR.2003.1227844
Punn, N.S., Agarwal, S.: Inception u-net architecture for semantic segmentation to identify nuclei in microscopy cell images. ACM Trans. Multimedia Comput. Commun. Appl. 16(1), 1–15 (2020). https://doi.org/10.1145/3376922
Rashid, S.F., Schambach, M.P., Rottland, J., Nüll, S.: Low resolution Arabic recognition with multidimensional recurrent neural networks (2013). https://doi.org/10.1145/2505377.2505385
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Sabbour, N., Shafait, F.: A segmentation free approach to Arabic and Urdu ocr. In: Proceedings of SPIE - The International Society for Optical Engineering, vol. 8658 (2013). https://doi.org/10.1117/12.2003731
Sagheer, M.W., He, C.L., Nobile, N., Suen, C.Y.: A new large Urdu database for off-line handwriting recognition. In: Foggia, P., Sansone, C., Vento, M. (eds.) ICIAP 2009. LNCS, vol. 5716, pp. 538–546. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04146-4_58
Sardar, S., Wahab, A.: Optical character recognition system for Urdu. In: 2010 International Conference on Information and Emerging Technologies, pp. 1–5 (2010)
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)
Semary, N., Rashad, M.: Isolated printed Arabic character recognition using knn and random forest tree classifiers, vol. 488, p. 11 (2014)
Shahin, A.: Printed Arabic text recognition using linear and nonlinear regression. Int. J. Adv. Comput. Sci. Appl. 8 (2017). https://doi.org/10.14569/IJACSA.2017.080129
Shaiq, M.D., Cheema, M.D.A., Kamal, A.: Transformer based Urdu handwritten text optical character reader (2022). https://doi.org/10.48550/ARXIV.2206.04575. https://arxiv.org/abs/2206.04575
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition (2015). https://doi.org/10.48550/ARXIV.1507.05717
Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification (2016). https://doi.org/10.48550/ARXIV.1603.03915. https://arxiv.org/abs/1603.03915
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). https://doi.org/10.48550/ARXIV.1409.1556
Singh, A., Bacchuwar, K., Bhasin, A.: A survey of OCR applications. Int. J. Mach. Learn. Comput. (IJMLC) 2, 314 (2012). https://doi.org/10.7763/IJMLC.2012.V2.137
Sobhi, M., Hifny, Y., Elkaffas, S.M.: Arabic optical character recognition using attention based encoder-decoder architecture. In: 2020 2nd International Conference on Artificial Intelligence, Robotics and Control, AIRC 2020, pp. 1–5. Association for Computing Machinery, New York (2021). https://doi.org/10.1145/3448326.3448327
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks (2014). https://doi.org/10.48550/ARXIV.1409.3215. https://arxiv.org/abs/1409.3215
Tabassam, N., Naqvi, S., Rehman, H., Anoshia, F.: Optical character recognition system for Urdu (Naskh font) using pattern matching technique. Int. J. Image Process. 3, 92 (2009)
Wang, J., Hu, X.: Gated recurrent convolution neural network for ocr. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS 2017, pp. 334–343. Curran Associates Inc., Red Hook (2017)
Wang, J., et al.: Deep high-resolution representation learning for visual recognition (2019)
Wang, W., et al.: Shape robust text detection with progressive scale expansion network (2019). https://doi.org/10.48550/ARXIV.1903.12473
Wang, Y., Xie, H., Fang, S., Wang, J., Zhu, S., Zhang, Y.: From two to one: a new scene text recognizer with visual language modeling network (2021). https://doi.org/10.48550/ARXIV.2108.09661. https://arxiv.org/abs/2108.09661
Wang, Y., Xie, H., Zha, Z., Xing, M., Fu, Z., Zhang, Y.: Contournet: taking a further step toward accurate arbitrary-shaped scene text detection (2020). https://doi.org/10.48550/ARXIV.2004.04940
Yuan, L., et al.: Tokens-to-token vit: training vision transformers from scratch on imagenet (2021). https://doi.org/10.48550/ARXIV.2101.11986. https://arxiv.org/abs/2101.11986
Zeiler, M.D.: Adadelta: an adaptive learning rate method (2012). https://doi.org/10.48550/ARXIV.1212.5701
Zhang, S.X., et al.: Deep relational reasoning graph network for arbitrary shape text detection (2020). https://doi.org/10.48550/ARXIV.2003.07493
Zhang, Z., Liu, Q., Wang, Y.: Road extraction by deep residual u-net. IEEE Geosci. Remote Sens. Lett. 15(5), 749–753 (2018). https://doi.org/10.1109/lgrs.2018.2802944
Zheng, T., Chen, Z., Fang, S., Xie, H., Jiang, Y.G.: Cdistnet: perceiving multi-domain character distance for robust text recognition (2021). https://doi.org/10.48550/ARXIV.2111.11011. https://arxiv.org/abs/2111.11011
Zhou, X., et al.: East: an efficient and accurate scene text detector (2017). https://doi.org/10.48550/ARXIV.1704.03155
Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N., Liang, J.: Unet++: a nested u-net architecture for medical image segmentation (2018). https://doi.org/10.48550/ARXIV.1807.10165
Zoizou, A., Zarghili, A., Chaker, I.: A new hybrid method for Arabic multi-font text segmentation, and a reference corpus construction. J. King Saud Univ. Comput. Inf. Sci. 32, 576–582 (2020)
Acknowledgement
We would like to express our gratitude to the Rekhta Foundation and Arjumand Ara for providing us with scanned images, as well as Noor Fatima and Mohammad Usman for their valuable annotations of the UTRSet-Real dataset. Furthermore, we acknowledge the support of a grant from IRD, IIT Delhi, and MEITY, Government of India, through the NLTM-Bhashini project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Rahman, A., Ghosh, A., Arora, C. (2023). UTRNet: High-Resolution Urdu Text Recognition in Printed Documents. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14191. Springer, Cham. https://doi.org/10.1007/978-3-031-41734-4_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-41734-4_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41733-7
Online ISBN: 978-3-031-41734-4
eBook Packages: Computer ScienceComputer Science (R0)