Skip to main content
Log in

A multi-pronged accurate approach to optical character recognition, using nearest neighborhood and neural-network-based principles

  • Published:
Sādhanā Aims and scope Submit manuscript

Abstract

Digital systems have been playing a vital role in various applications such as banking, finance, healthcare, manufacturing, security and so on. Also, their role and applications are becoming wider and more crucial. In many such applications, identifying and recognizing a character, or a digit accurately, plays a significant role, especially in banking and financial sectors and other sectors where an error can cause much loss or damage. This, technically, is called the optical character recognition (OCR) problem. In this context, contribution of this paper is of two folds. First, we propose a multi-layer perceptron (MLP) neural network architecture that includes an input layer, hidden layers and an output layer to develop an effective method for OCR. The architecture builds a model that learns representations from the input data and further these representations are used for classifying the unknown data. This proposed method MLP, which recognizes optical characters, is compared to existing nearest neighborhood methods such as condensed nearest neighbor (CNN), modified condensed nearest neighbor (MCNN) and other class nearest neighbor (OCNN), in performance. Posterior probabilities and conditional probabilities pertaining to recognition are computed, estimated and validated on the test data (OCR and Pendigits) for all the afore-mentioned methods. Using these posterior probabilities, probabilities of detection of the newly drawn character or digits can be estimated. The proposed model in this paper outperforms existing methods. The second contribution is as follows. In certain critical applications, it is very important to achieve the highest possible accuracy even if it is expensive. To achieve this a multi-pronged approach using multiple methods is developed based on these four methods, in order to improve and estimate the accuracy, in cases when multiple methods concur or otherwise.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6

Similar content being viewed by others

References

  1. Yanming Guo and Yu Liu et al 2016 Deep learning for visual understanding: a review. Neuro Computing 187: 27–48

  2. Ciregan D, Meier U and Schmidhuber J 2012 Multi-column deep neural networks for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3642–3649

  3. Krizhevsky A, Sutskever L and Hinton G E 2012 ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105

  4. Mikolov T, Sutskever I, Corrado G S and Dean J 2013 Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119

  5. Bordes A, Glorot X, Weston J and Bengio Y 2012 Joint learning of words and meaning representations for open-text semantic parsing. In: Proceedings of the International Conference on Artificial Intelligence and Statistics, IEEE, pp. 127–135

  6. Deng L 2014 A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Transactions on Signal and Information Processing 3: https://doi.org/10.1017/ATSIP.2014.4

  7. Bengia Y 2009 Learning deep architectures for AI. Foundations and trends in Machine Learning 2(1): 1–127

    Article  Google Scholar 

  8. Schmidhuber J 2015 Deep learning in neural networks: an overview. Machine Learning 61: 85–127

    Google Scholar 

  9. Bengio Y 2013 Deep learning of representations: looking forward. In: Proceedings of the International Conference on Statistical Language and Speech Processing, pp. 1–37

  10. Bengio Y, Courville A and Vincent P 2013 Representation learning: a review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35: 1798–1828

    Article  Google Scholar 

  11. Kishor Kumar G, Viswanath P and Ananda Rao A 2011 Intrusion detection using an ensemble of decision trees. In: Proceedings of the Indian International Conference on Artificial Intelligence, pp. 382–392

  12. Kishor Kumar G, Viswanath P and Ananda Rao A 2016 Ensemble of randomized soft decision trees for robust classification. Sadhana 41(3): 273–282

    MathSciNet  MATH  Google Scholar 

  13. Kishor Kumar G, Viswanath P and Ananda Rao A 2015 Ensemble of soft decision trees using multiple approximate fuzzy-rough set based reducts. International Journal of Information Processing 9(2): 36–46

    Google Scholar 

  14. Raj Kumar R, Viswanath P and Shoba Bindu C, 2016 An approach to reduce the computational burden of nearest neighbor classifier. Procedia Computer Science 85: 588–597

    Article  Google Scholar 

  15. Raj Kumar R, Viswanath P and Shoba Bindu C 2016 Nearest neighbor classifiers: reducing the computational demands. In: Proceedings of the 6th IEEE International Conference on Advanced Computing (IACC), pp. 45–50

  16. Raj Kumar R, Viswanath P and Shoba Bindu C 2017 Nearest neighbor classifiers: a review. International Journal of Computational Intelligence Research 13(2): 303–311

    Google Scholar 

  17. Tomek I 1976 Two modifications of CNN. IEEE Transactions on Systems, Man and Cybernetics 6: 769–772

    MathSciNet  MATH  Google Scholar 

  18. Swonger C W 1972 Sample set condensation for a condensed nearest neighbor decision rule for pattern recognition. In: Frontiers in Pattern Recognition, pp. 511–526

  19. Gates G 1972 The reduced nearest neighbor rule (corresp.). IEEE Transactions on Information Theory 18(3): 431–433

  20. Marr D 1983 Vision: a computational investigation into the human representation and processing of visual information. San Francisco: W. H. Freeman

    Google Scholar 

  21. Hubel D H and Wiesel T N 1962 Receptive fields, binocular interaction and functional architecture in the cats visual cortex. Journal of Physiology 160: 106–154

    Article  Google Scholar 

  22. The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/

  23. Kwolek B 2005 Face detection using convolutional neural networks and Gabor filters. Lecture Notes in Computer Science 3696: 551–556

    Article  Google Scholar 

  24. Osadchy M, LeCun Y and Miller M, 2007 Synergistic face detection and pose estimation with energy-based models. Journal of Machine Learning Research 8: 1197–1215

    Google Scholar 

  25. Huang F J and LeCun Y 2006 Large-scale learning with SVM and convolutional nets for generic object categorization. In: Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR 06)

  26. Lee H, Largman Y, Pham P and Ng A 2009 Unsupervised feature learning for audio classification using convolutional deep belief networks. In: Advances in Neural Information Processing Systems 22

  27. http://www.darpa.mil/IPTO/solicit/baa/BAA09-40PIP.pdf

  28. http://www.numenta.com

  29. http://www.binatix.com

  30. Sivic J, Everingham M and Zisserman A 2005 Person spotting: video shot retrieval for face sets. In: Proceedings of CIVR

  31. Lu C and Tang X 2015 Surpassing human-level face verification performance on LFW with gaussian face. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, Texas, June 2015, pp. 3811–3819

  32. Cinbis R G, Verbeek J J and Schmid C 2011 Unsupervised metric learning for face identification in TV video. In: Proceedings of ICCV, pp. 1559–1566

  33. Parkhi O M, Simonyan K, Vedaldi A and Zisserman A 2014 A compact and discriminative face track descriptor. In: Proceedings of CVPR

  34. Simonyan K, Parkhi O M, Vedaldi A and Zisserman A 2013 Fisher vector faces in the wild. In: Proceedings of BMVC

  35. Jeff Donahue, Lisa Anne Hendricks et al 2015 Long-term recurrent convolutional networks for visual recognition and description. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(4): 677–691

    Article  Google Scholar 

  36. Andrea Vedaldi and Karel Lenc 2015 MatConvNet: convolutional neural networks for MATLAB. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 689–692

  37. Alec Radford, Luke Metz and Soumith Chintala 2015 Unsupervised representation learning with deep convolutional generative adversarial networks. In: Proceedings of ICLR2015, pp. 689–692

  38. Olaf Ronneberger, Philipp Fischer and Thomas Brox 2015 U-Net: convolutional networks for biomedical image segmentation. In: Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241

  39. Shuai Z, Sadeep J, Bernardino R, Vibhav V et al 2015 Conditional random fields as recurrent neural networks. In: Proceedings of ICCV, https://doi.org/10.1109/ICCV.2015.179

  40. Chao D, Chen C, Kaiming H and Xiaoou T 2014 Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 38: 295–307

    Google Scholar 

  41. Ng Joe Y, Matthew J H, Sudheendra V et al 2014 Beyond short snippets: deep networks for video classification. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), https://doi.org/10.1109/CVPR.2015.7299101

  42. Christian Szegedy, Sergey Ioffe et al 2017 Inception-V4, Inception-ResNet and the impact of residual connections on learning. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17)

  43. Jiang H Z, Wang J D, Yuan Z J, Wu Y, Zheng N N and Li S P 2013 Salient object detection: a discriminative regional feature integration approach. In: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2083–2090, https://doi.org/10.1109/CVPR.2013.271

  44. Volodymyr M, Adria P B, Mehdi M et al 2016 Asynchronous methods for deep reinforcement learning. Proceedings of Machine Learning Research 48: 1928–1937

    Google Scholar 

  45. Rami A, Guillaume A, Amjad A, Christof A et al 2016 Theano: a Python framework for fast computation of mathematical expressions. arXiv preprint arXiv:1605.02688

  46. Ziwei L, Ping L, Xiaogang W and Xiaoou T 2015 Deep learning face attributes in the wild. In: Proceedings of the 2015 IEEE International Conference on Computer Vision

  47. Zhang X, Zhao J J and Lecun Y 2015 Character level convolutional networks for text classification. In: Proceedings of the Neural Information Processing Systems Conference, Montreal, Quebec, Canada

  48. Das N, Sarkar R, Basu S, Kundu M, Nasipuri M and Basu D K 2012 A genetic algorithm based region sampling for selection of local features in handwritten digit recognition application. Applied Soft Computing 12(5): 1592–1606

    Article  Google Scholar 

  49. Sarkhel R, Das N, Das A, Kundu M and Nasipuri M 2017 A multi-scale deep quad tree based feature extraction method for the recognition of isolated handwritten characters of popular indic scripts. Pattern Recognition 1(71): 78–93

    Article  Google Scholar 

  50. Khandelwal A, Choudhury P, Sarkar R, Basu S, Nasipuri M and Das N 2009 Text line segmentation for unconstrained handwritten document images using neighborhood connected component analysis. In: Proceedings of the International Conference on Pattern Recognition and Machine Intelligence, December 16, pp. 369–374

  51. Basu S, Das N, Sarkar R, Kundu M, Nasipuri M and Basu D K 2012 An MLP based approach for recognition of handwritten Bangla numerals. arXiv preprint arXiv:1203.0876

  52. Pal A, Jaiswal S, Ghosh S, Das N and Segfast N M 2019 A faster squeezenet based semantic image segmentation technique using depth-wise separable convolutions. In: Proceedings of the 11th Indian Conference on Computer Vision, Graphics and Image Processing

  53. Sarkhel Ritesh, Saha A K and Nibaran Das 2015 An enhanced harmony search method for Bangla handwritten character recognition using region sampling. In: Proceedings of the 2nd IEEE International Conference on Recent Trends in Information Systems (ReTIS)

  54. Gupta Anisha et al 2019 Multi-objective optimization for recognition of isolated handwritten Indic scripts. Pattern Recognition Letters 128: 318–325

    Article  Google Scholar 

  55. Khan N H and Adnan A 2018 Urdu optical character recognition systems: present contributions and future directions. IEEE Access 6: 46019–46046

    Article  Google Scholar 

  56. Noman Islam, Zeeshan Islam and Nazia Noor 2016. A survey on optical character recognition system. Journal of Information and Communication Technology 10(2): 1–4

    Google Scholar 

  57. Devi V S and Murty M N 2002 An incremental prototype set building technique. Pattern Recognition 35(2): 505–513

    Article  Google Scholar 

  58. Dua D and Graff C 2019 UCI Machine Learning Repository. [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to G Kishor Kumar.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kumar, G.K., Kumar, R.R., Chakka, R. et al. A multi-pronged accurate approach to optical character recognition, using nearest neighborhood and neural-network-based principles. Sādhanā 46, 189 (2021). https://doi.org/10.1007/s12046-021-01703-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12046-021-01703-3

Keywords

Navigation