Characterization and Distinction Between Closely Related South Slavic Languages on the Example of Serbian and Croatian

Brodić, Darko; Amelio, Alessia; Milivojević, Zoran N.

doi:10.1007/978-3-319-23192-1_55

Darko Brodić¹⁵,
Alessia Amelio¹⁶ &
Zoran N. Milivojević¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9256))

Included in the following conference series:

International Conference on Computer Analysis of Images and Patterns

3067 Accesses
8 Citations

Abstract

The paper proposes a new method for characterization and distinction between closely related languages on the example of Serbian and Croatian languages. In the first step, the method transforms the text in different languages into the uniformly coded text. It is carried out in accordance to the position of each sign of the script in the text line and its height. Then, the coded text given as 1-D image is subjected to the texture analysis. According to that analysis, a feature vector of 28 elements is established. These 28 elements are extracted from co-occurrence texture and adjacent local binary pattern analysis. The feature vector is a starting point for classification by an extension of a state of the art method, called GA-ICDA. As a result, the distinction between the closely related languages is correctly accomplished. The method is tested on a database of documents in Serbian and Croatian languages. The experiments give promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, C., Zhai, C.: A survey of text clustering algorithms. Mining Text Data, pp. 77–128. Springer (2012)
Google Scholar
Amelio, A., Pizzuti, C.: A new evolutionary-based clustering framework for image databases. In: Elmoataz, A., Lezoray, O., Nouboud, F., Mammass, D. (eds.) ICISP 2014. LNCS, vol. 8509, pp. 322–331. Springer, Heidelberg (2014)
Google Scholar
Andrews, N.O., Fox, E.A.: Recent Developments in Document Clustering. Technical report, Computer Science, Virginia Tec. (2009)
Google Scholar
Diem, M., Kleber, F., Fiel, S., Sablatnig, R.: Semi-automated document image clustering and retrieval (2013)
Google Scholar
Hu, X., Yoo, I.: A comprehensive comparison study of document clustering for a biomedical digital library medline. In: Proc. 6th ACM/IEEE-CS Joint Conference, pp. 220–229 (2006)
Google Scholar
Ji, J., Zhao, Q.: Applying naive bayes classifier to document clustering. JACIII 14(6), 624–630 (2010)
Google Scholar
Liu, X., Gong, Y., Xu, W., Zhu, S.: Document clustering with cluster refinement and model selection capabilities. In: Proc. 25th Ann. Int. ACM SIGIR Conf. on Research and Devel. in Inf. Retr., SIGIR 102, NY, USA, pp. 191–198 (2002)
Google Scholar
Marinai, S., Marino, E., Soda, G.: Self-organizing maps for clustering in document image analysis. In: Marinai, S., Fujisawa, H. (eds.) Mach. Learn. in Doc. Anal. and Recogn. SCI, vol. 90, pp. 193–219. Springer, Heidelberg (2008)
Chapter Google Scholar
Mart, R., Laguna, M., Glover, F., Campos, V.: Reducing the bandwidth of a sparse matrix with tabu search. Europ. J. Oper. Res. 135(2), 450–459 (2001)
Article Google Scholar
Pu, Y., Shi, J., Guo, L.: A hierarchical method for clustering binary text image. In: Yuan, Y., Wu, X., Lu, Y. (eds.) ISCTCS 2012. CCIS, vol. 320, pp. 388–396. Springer, Heidelberg (2013)
Chapter Google Scholar
De Vries, C.M., Geva, S., Trotman, A.: Document clustering evaluation: Divergence from a random baseline. CoRR, abs/1208.5654 (2012)
Google Scholar
Yang, C., Yi, Z.: Document clustering using locality preserving indexing and support vector machines. Soft Comp. 12(7), 677–683 (2008)
Article Google Scholar
Ronelle, A.: In honor of diversity: the linguistic resources of the Balkans. In: Kenneth, E. (ed.) Naylor Memorial Lecture Series in South Slavic Linguistics, vol. 2, Ohio State University, Dept. of Slavic and East European Languages and Literatures (2000)
Google Scholar
Dale, I.R.H.: Digraphia. Int. J. of the Soc. of Lang. 26, 5–13 (1980)
Google Scholar
Miller, B.: Translating Between Closely Related Languages in Statistical Machine Translation. Master of Science by Research, School of Informatics, University of Edinburg (2008)
Google Scholar
Kordic, S.: Pro und kontra: “Serbokroatisch heute”. In: Slavistische Linguistik 2002: Referate des XXVIII. Konstanzer Slavistischen Arbeitstreffens, Bochum 2002. Slavistishe Beitrage, vol. 434, p. 141. Otto Sagner, Munich (2002)
Google Scholar
Greenberg, R.D.: Language and identity in the Balkans: Serbo-Croatian and its disintegration. Oxford University Press (2004)
Google Scholar
Brodić, D., Milivojević, Z.N., Maluckov, Č.A.: An approach to the script discrimination in the Slavic documents. Soft Comp. (in press) (online). doi:10.1007/s00500-014-1435-1
Brodić, D., Milivojević, Z.N., Maluckov, Č.A.: Recognition of the Script in Serbian Documents using Frequency Occurrence and Co-occurrence Analysis. The Scient. World J. 2013(896328), 1–14 (2013)
Article Google Scholar
Nosaka, R., Ohkawa, Y., Fukui, K.: Feature extraction based on co-occurrence of adjacent local binary patterns. In: Ho, Y.-S. (ed.) PSIVT 2011, Part II. LNCS, vol. 7088, pp. 82–91. Springer, Heidelberg (2011)
Chapter Google Scholar
Zramdini, A.W., Ingold, R.: Optical Font Recognition Using Typographical Features. IEEE T. Pattern Anal. 20(8), 877–882 (1998)
Article Google Scholar
Yi, L.: Machine printed character segmentation An overview. Patt. Rec. 28(1), 67–80 (1995)
Article Google Scholar
Haralick, R.M., Shanmugan, K., Dinstein, I.: Textural features for image classification. IEEE T. Sys., Man, and Cyber. 3(6), 610–621 (1973)
Article Google Scholar
Eleyan, A., Demirel, H.: Co-occurrence matrix and its statistical features as a new approach for face recognition. Turkish J. Electr. Engin. and Comp. Sci. 19(1), 97–107 (2011)
Google Scholar
Clausi, D.A.: An analysis of co-occurrence texture statistics as a function of grey level quantization. Canadian J. Remote Sens. 28(1), 45–62 (2002)
Article Google Scholar
Tiedemann, J., Ljubesic, N.: Efficient discrimination between closely related languages. In: Proceedings of COLING 2012, Mumbai, India, pp. 2619–2634 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Technical Faculty in Bor, University of Belgrade, V.J. 12, 19210, Bor, Serbia
Darko Brodić
Institute for High Performance Computing and Networking, National Research Council of Italy, CNR-ICAR, Via P. Bucci 41C, 87036, Rende (CS), Italy
Alessia Amelio
College of Applied Technical Sciences, Aleksandra Medvedeva 20, 18000, Niš, Serbia
Zoran N. Milivojević

Authors

Darko Brodić
View author publications
You can also search for this author in PubMed Google Scholar
Alessia Amelio
View author publications
You can also search for this author in PubMed Google Scholar
Zoran N. Milivojević
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Darko Brodić .

Editor information

Editors and Affiliations

University of Malta, Msida, Malta
George Azzopardi
University of Groningen, Groningen, The Netherlands
Nicolai Petkov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brodić, D., Amelio, A., Milivojević, Z.N. (2015). Characterization and Distinction Between Closely Related South Slavic Languages on the Example of Serbian and Croatian. In: Azzopardi, G., Petkov, N. (eds) Computer Analysis of Images and Patterns. CAIP 2015. Lecture Notes in Computer Science(), vol 9256. Springer, Cham. https://doi.org/10.1007/978-3-319-23192-1_55

Download citation

DOI: https://doi.org/10.1007/978-3-319-23192-1_55
Published: 25 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23191-4
Online ISBN: 978-3-319-23192-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics