Character-Level Alignment Using WFST and LSTM for Post-processing in Multi-script Recognition Systems - A Comparative Study

Al Azawi, Mayce; Ul Hasan, Adnan; Liwicki, Marcus; Breuel, Thomas M.

doi:10.1007/978-3-319-11758-4_41

Character-Level Alignment Using WFST and LSTM for Post-processing in Multi-script Recognition Systems - A Comparative Study

Mayce Al Azawi¹⁷,
Adnan Ul Hasan¹⁷,
Marcus Liwicki¹⁷ &
…
Thomas M. Breuel¹⁷

Conference paper
First Online: 10 October 2014

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8814))

Abstract

In this paper, two new techniques to correct the OCR errors are proposed, recurrent neural networks with Long-Short Term Memory (LSTM), and Weighted Finite State Transducers (WFSTs) with context-dependent confusion rules. Both methods are applied on OCR results of Latin, and Urdu Script. Especially Urdu script is very challenging to OCR. For building an error model using context-dependent confusion rules, the OCR confusions which appear in the recognition outputs are translated into edit operations using Levenshtein edit distance algorithm. The new LSTM model avoids the calculations that occur in searching the language model and it also makes the language model eligible to correct unseen incorrect words. Our generic approaches are language independent. The proposed supervised LSTM model is compared with the context-dependent error model and state-of-the-art single rule-based methods. The evaluation on Latin script shows the error rate of LSTM is 0.48 %, error model is 0.68 % and the rule-based model is 1.0 %. The evaluation shows that the accuracy of LSTM model on the Urdu testset is 1.58 %, while the accuracy of the error model is 3.8 % and OCR recognition results is 6.9 % for Urdu testset. LSTM showed best performance on both Latin and Urdu script. As such, experiments show that LSTM performs very well in language techniques, especially, post-processing.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Al-Azawi, M., Afzal, M.Z., Breuel, T.M.: Normalizing historical orthography for OCR historical documents using LSTM. In: Proc. of the 2nd International Workshop on Historical Document Imaging and Processing, HIP 2013, pp. 80–85. ACM, New York (2013)
Google Scholar
Al-Azawi, M.I.A., Liwicki, M., Breuel, T.M.: WFST-based ground truth alignment for difficult historical documents with text modification and layout variations. In: DRR Proc. SPIE (2013)
Google Scholar
Allauzen, C., Riley, M.D., Schalkwyk, J., Skut, W., Mohri, M.: OpenFst: a general and efficient weighted finite-state transducer library. In: Holub, J., Žďárek, J. (eds.) CIAA 2007. LNCS, vol. 4783, pp. 11–23. Springer, Heidelberg (2007)
Chapter Google Scholar
Frinken, V., Zamora-Martinez, F., Espana-Boquera, S., Castro-Bleda, M., Fischer, A., Bunke, H.: Long-short term memory neural networks language modeling for handwriting recognition. In: 21st ICPR, pp. 701–704 (November 2012)
Google Scholar
Graves, A., Liwicki, M., Fernandez, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(5), 855–868 (2009)
Article Google Scholar
Hassan, A., Noeman, S., Hassan, H.: Language independent text correction using finite state automata. In: International Joint Conference on NLP (2008)
Google Scholar
Levenshtein, V.: Binary Codes Capable of Correcting Deletions, Insertions, and Reversals. Soviet Physics-Doklady 10(8), 707–710 (1966)
MathSciNet Google Scholar
Llobet, R., Navarro-Cerdan, J.R., Perez-Cortes, J.C., Arlandis, J.: Efficient OCR post-processing combining language, hypothesis and error models. In: Hancock, E.R., Wilson, R.C., Windeatt, T., Ulusoy, I., Escolano, F. (eds.) SSPR & SPR 2010. LNCS, vol. 6218, pp. 728–737. Springer, Heidelberg (2010)
Chapter Google Scholar
Mohri, M.: Edit-distance of weighted automata. In: Champarnaud, J.-M., Maurel, D. (eds.) CIAA 2002. LNCS, vol. 2608, pp. 1–23. Springer, Heidelberg (2003)
Chapter Google Scholar
Mikolov, T., Deoras, A., Kombrink, S., Burget, L., Cernocky, J.: Empirical evaluation and combination of advanced language modeling techniques. In: Proc. of Inter. Speech Communication Association, Florence, Italy (2011)
Google Scholar
Ul-Hasan, A., Bin Ahmed, S., Rashid, F., Shafait, F., Breuel, T.: Offline printed urdu nastaleeq script recognition with bidirectional LSTM networks. In: 12th Intern. Conf. on Document Analysis and Recognition, pp. 1061–1065 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

German Research Center for Artificial Intelligence, University of Kaiserslautern, 67663, Kasierslautern, Germany
Mayce Al Azawi, Adnan Ul Hasan, Marcus Liwicki & Thomas M. Breuel

Authors

Mayce Al Azawi
View author publications
You can also search for this author in PubMed Google Scholar
Adnan Ul Hasan
View author publications
You can also search for this author in PubMed Google Scholar
Marcus Liwicki
View author publications
You can also search for this author in PubMed Google Scholar
Thomas M. Breuel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mayce Al Azawi .

Editor information

Editors and Affiliations

Faculty of Engineering, University of Porto, Porto, Portugal
Aurélio Campilho
Dept. of Electrical and Computer Eng., University of Waterloo, Waterloo, Ontario, Canada
Mohamed Kamel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Al Azawi, M., Ul Hasan, A., Liwicki, M., Breuel, T.M. (2014). Character-Level Alignment Using WFST and LSTM for Post-processing in Multi-script Recognition Systems - A Comparative Study. In: Campilho, A., Kamel, M. (eds) Image Analysis and Recognition. ICIAR 2014. Lecture Notes in Computer Science(), vol 8814. Springer, Cham. https://doi.org/10.1007/978-3-319-11758-4_41

Download citation

DOI: https://doi.org/10.1007/978-3-319-11758-4_41
Published: 10 October 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11757-7
Online ISBN: 978-3-319-11758-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics