Skip to main content

Characterization and Distinction Between Closely Related South Slavic Languages on the Example of Serbian and Croatian

  • Conference paper
  • First Online:
Computer Analysis of Images and Patterns (CAIP 2015)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9256))

Included in the following conference series:

Abstract

The paper proposes a new method for characterization and distinction between closely related languages on the example of Serbian and Croatian languages. In the first step, the method transforms the text in different languages into the uniformly coded text. It is carried out in accordance to the position of each sign of the script in the text line and its height. Then, the coded text given as 1-D image is subjected to the texture analysis. According to that analysis, a feature vector of 28 elements is established. These 28 elements are extracted from co-occurrence texture and adjacent local binary pattern analysis. The feature vector is a starting point for classification by an extension of a state of the art method, called GA-ICDA. As a result, the distinction between the closely related languages is correctly accomplished. The method is tested on a database of documents in Serbian and Croatian languages. The experiments give promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C., Zhai, C.: A survey of text clustering algorithms. Mining Text Data, pp. 77–128. Springer (2012)

    Google Scholar 

  2. Amelio, A., Pizzuti, C.: A new evolutionary-based clustering framework for image databases. In: Elmoataz, A., Lezoray, O., Nouboud, F., Mammass, D. (eds.) ICISP 2014. LNCS, vol. 8509, pp. 322–331. Springer, Heidelberg (2014)

    Google Scholar 

  3. Andrews, N.O., Fox, E.A.: Recent Developments in Document Clustering. Technical report, Computer Science, Virginia Tec. (2009)

    Google Scholar 

  4. Diem, M., Kleber, F., Fiel, S., Sablatnig, R.: Semi-automated document image clustering and retrieval (2013)

    Google Scholar 

  5. Hu, X., Yoo, I.: A comprehensive comparison study of document clustering for a biomedical digital library medline. In: Proc. 6th ACM/IEEE-CS Joint Conference, pp. 220–229 (2006)

    Google Scholar 

  6. Ji, J., Zhao, Q.: Applying naive bayes classifier to document clustering. JACIII 14(6), 624–630 (2010)

    Google Scholar 

  7. Liu, X., Gong, Y., Xu, W., Zhu, S.: Document clustering with cluster refinement and model selection capabilities. In: Proc. 25th Ann. Int. ACM SIGIR Conf. on Research and Devel. in Inf. Retr., SIGIR 102, NY, USA, pp. 191–198 (2002)

    Google Scholar 

  8. Marinai, S., Marino, E., Soda, G.: Self-organizing maps for clustering in document image analysis. In: Marinai, S., Fujisawa, H. (eds.) Mach. Learn. in Doc. Anal. and Recogn. SCI, vol. 90, pp. 193–219. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  9. Mart, R., Laguna, M., Glover, F., Campos, V.: Reducing the bandwidth of a sparse matrix with tabu search. Europ. J. Oper. Res. 135(2), 450–459 (2001)

    Article  Google Scholar 

  10. Pu, Y., Shi, J., Guo, L.: A hierarchical method for clustering binary text image. In: Yuan, Y., Wu, X., Lu, Y. (eds.) ISCTCS 2012. CCIS, vol. 320, pp. 388–396. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  11. De Vries, C.M., Geva, S., Trotman, A.: Document clustering evaluation: Divergence from a random baseline. CoRR, abs/1208.5654 (2012)

    Google Scholar 

  12. Yang, C., Yi, Z.: Document clustering using locality preserving indexing and support vector machines. Soft Comp. 12(7), 677–683 (2008)

    Article  Google Scholar 

  13. Ronelle, A.: In honor of diversity: the linguistic resources of the Balkans. In: Kenneth, E. (ed.) Naylor Memorial Lecture Series in South Slavic Linguistics, vol. 2, Ohio State University, Dept. of Slavic and East European Languages and Literatures (2000)

    Google Scholar 

  14. Dale, I.R.H.: Digraphia. Int. J. of the Soc. of Lang. 26, 5–13 (1980)

    Google Scholar 

  15. Miller, B.: Translating Between Closely Related Languages in Statistical Machine Translation. Master of Science by Research, School of Informatics, University of Edinburg (2008)

    Google Scholar 

  16. Kordic, S.: Pro und kontra: “Serbokroatisch heute”. In: Slavistische Linguistik 2002: Referate des XXVIII. Konstanzer Slavistischen Arbeitstreffens, Bochum 2002. Slavistishe Beitrage, vol. 434, p. 141. Otto Sagner, Munich (2002)

    Google Scholar 

  17. Greenberg, R.D.: Language and identity in the Balkans: Serbo-Croatian and its disintegration. Oxford University Press (2004)

    Google Scholar 

  18. Brodić, D., Milivojević, Z.N., Maluckov, Č.A.: An approach to the script discrimination in the Slavic documents. Soft Comp. (in press) (online). doi:10.1007/s00500-014-1435-1

  19. Brodić, D., Milivojević, Z.N., Maluckov, Č.A.: Recognition of the Script in Serbian Documents using Frequency Occurrence and Co-occurrence Analysis. The Scient. World J. 2013(896328), 1–14 (2013)

    Article  Google Scholar 

  20. Nosaka, R., Ohkawa, Y., Fukui, K.: Feature extraction based on co-occurrence of adjacent local binary patterns. In: Ho, Y.-S. (ed.) PSIVT 2011, Part II. LNCS, vol. 7088, pp. 82–91. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  21. Zramdini, A.W., Ingold, R.: Optical Font Recognition Using Typographical Features. IEEE T. Pattern Anal. 20(8), 877–882 (1998)

    Article  Google Scholar 

  22. Yi, L.: Machine printed character segmentation An overview. Patt. Rec. 28(1), 67–80 (1995)

    Article  Google Scholar 

  23. Haralick, R.M., Shanmugan, K., Dinstein, I.: Textural features for image classification. IEEE T. Sys., Man, and Cyber. 3(6), 610–621 (1973)

    Article  Google Scholar 

  24. Eleyan, A., Demirel, H.: Co-occurrence matrix and its statistical features as a new approach for face recognition. Turkish J. Electr. Engin. and Comp. Sci. 19(1), 97–107 (2011)

    Google Scholar 

  25. Clausi, D.A.: An analysis of co-occurrence texture statistics as a function of grey level quantization. Canadian J. Remote Sens. 28(1), 45–62 (2002)

    Article  Google Scholar 

  26. Tiedemann, J., Ljubesic, N.: Efficient discrimination between closely related languages. In: Proceedings of COLING 2012, Mumbai, India, pp. 2619–2634 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Darko Brodić .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Brodić, D., Amelio, A., Milivojević, Z.N. (2015). Characterization and Distinction Between Closely Related South Slavic Languages on the Example of Serbian and Croatian. In: Azzopardi, G., Petkov, N. (eds) Computer Analysis of Images and Patterns. CAIP 2015. Lecture Notes in Computer Science(), vol 9256. Springer, Cham. https://doi.org/10.1007/978-3-319-23192-1_55

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23192-1_55

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23191-4

  • Online ISBN: 978-3-319-23192-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics